版本：Next

OssFile

Oss文件数据源连接器

支持的引擎

Spark
Flink
SeaTunnel Zeta

使用依赖

对于Spark/Flink引擎

您必须确保您的spark/flink集群已经集成了hadoop。测试过的hadoop版本是2.x。
您必须确保hadoop-aliyun-xx.jar、aliyun-sdk-oss-xx.jar和jdom-xx.jar在${SEATUNNEL_HOME}/plugins/目录中，并且hadoop-aliyun jar的版本需要与您在spark/flink中使用的hadoop版本相等，aliyun-sdk-oss-xx.jar和jdom-xx.jar版本需要是与hadoop-aliyun版本对应的版本。例如：hadoop-aliyun-3.1.4.jar依赖aliyun-sdk-oss-3.4.1.jar和jdom-1.1.jar。

对于SeaTunnel Zeta引擎

您必须确保seatunnel-hadoop3-3.1.4-uber.jar、aliyun-sdk-oss-3.4.1.jar、hadoop-aliyun-3.1.4.jar和jdom-1.1.jar在${SEATUNNEL_HOME}/lib/目录中。

主要特性

数据类型映射

数据类型映射与正在读取的文件类型相关，我们支持以下文件类型：

text csv parquet orc json excel xml markdown

JSON文件类型

如果您将文件类型指定为json，您还应该指定schema选项来告诉连接器如何将数据解析为您想要的行。

例如：

上游数据如下：

{"code":  200, "data":  "get success", "success":  true}

您也可以在一个文件中保存多条数据，并用换行符分隔：

{"code":  200, "data":  "get success", "success":  true}
{"code":  300, "data":  "get failed", "success":  false}

您应该按如下方式指定schema：

schema {
    fields {
        code = int
        data = string
        success = boolean
    }
}

连接器将生成如下数据：

code	data	success
200	get success	true

文本或CSV文件类型

如果您将file_format_type设置为text、excel、csv、xml。那么需要设置schema字段来告诉连接器如何将数据解析为行。

如果您设置了schema字段，您还应该设置选项field_delimiter，除非file_format_type是csv、xml、excel

您可以按如下方式设置schema和分隔符：

field_delimiter = "#"
schema {
    fields {
        name = string
        age = int
        gender = string 
    }
}

连接器将生成如下数据：

name	age	gender
tyrantlucifer	26	male

Orc文件类型

如果您将文件类型指定为parquet orc，则不需要schema选项，连接器可以自动找到上游数据的schema。

Orc数据类型	SeaTunnel数据类型
BOOLEAN	BOOLEAN
INT	INT
BYTE	BYTE
SHORT	SHORT
LONG	LONG
FLOAT	FLOAT
DOUBLE	DOUBLE
BINARY	BINARY
STRING VARCHAR CHAR	STRING
DATE	LOCAL_DATE_TYPE
TIMESTAMP	LOCAL_DATE_TIME_TYPE
DECIMAL	DECIMAL
LIST(STRING)	STRING_ARRAY_TYPE
LIST(BOOLEAN)	BOOLEAN_ARRAY_TYPE
LIST(TINYINT)	BYTE_ARRAY_TYPE
LIST(SMALLINT)	SHORT_ARRAY_TYPE
LIST(INT)	INT_ARRAY_TYPE
LIST(BIGINT)	LONG_ARRAY_TYPE
LIST(FLOAT)	FLOAT_ARRAY_TYPE
LIST(DOUBLE)	DOUBLE_ARRAY_TYPE
Map<K,V>	MapType，K和V的类型将转换为SeaTunnel类型
STRUCT	SeaTunnelRowType

Parquet文件类型

如果您将文件类型指定为parquet orc，则不需要schema选项，连接器可以自动找到上游数据的schema。

Parquet数据类型	SeaTunnel数据类型
INT_8	BYTE
INT_16	SHORT
DATE	DATE
TIMESTAMP_MILLIS	TIMESTAMP
INT64	LONG
INT96	TIMESTAMP
BINARY	BYTES
FLOAT	FLOAT
DOUBLE	DOUBLE
BOOLEAN	BOOLEAN
FIXED_LEN_BYTE_ARRAY	TIMESTAMP DECIMAL
DECIMAL	DECIMAL
LIST(STRING)	STRING_ARRAY_TYPE
LIST(BOOLEAN)	BOOLEAN_ARRAY_TYPE
LIST(TINYINT)	BYTE_ARRAY_TYPE
LIST(SMALLINT)	SHORT_ARRAY_TYPE
LIST(INT)	INT_ARRAY_TYPE
LIST(BIGINT)	LONG_ARRAY_TYPE
LIST(FLOAT)	FLOAT_ARRAY_TYPE
LIST(DOUBLE)	DOUBLE_ARRAY_TYPE
Map<K,V>	MapType，K和V的类型将转换为SeaTunnel类型
STRUCT	SeaTunnelRowType

选项

名称	类型	是否必需	默认值	描述
path	string	是	-	需要读取的Oss路径，可以有子路径，但子路径需要满足一定的格式要求。具体要求可以参考"parse_partition_from_path"选项
file_format_type	string	是	-	文件类型，支持以下文件类型：`text` `csv` `parquet` `orc` `json` `excel` `xml` `binary` `markdown`
bucket	string	是	-	oss文件系统的bucket地址，例如：`oss://seatunnel-test`。
endpoint	string	是	-	fs oss端点
read_columns	list	否	-	数据源的读取列列表，用户可以使用它来实现字段投影。支持列投影的文件类型如下所示：`text` `csv` `parquet` `orc` `json` `excel` `xml`。如果用户想在读取`text` `json` `csv`文件时使用此功能，必须配置"schema"选项。
access_key	string	否	-
access_secret	string	否	-
delimiter	string	否	\001	字段分隔符，用于告诉连接器在读取文本文件时如何切分字段。默认`\001`，与hive的默认分隔符相同。
row_delimiter	string	否	\n	行分隔符，用于告诉连接器在读取文本文件时如何切分行。默认`\n`。
parse_partition_from_path	boolean	否	true	控制是否从文件路径解析分区键和值。例如，如果您从路径`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`读取文件。文件中的每条记录数据都将添加这两个字段：name="tyrantlucifer"，age=16
date_format	string	否	yyyy-MM-dd	日期类型格式，用于告诉连接器如何将字符串转换为日期，支持以下格式：`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`。默认`yyyy-MM-dd`
datetime_format	string	否	yyyy-MM-dd HH:mm:ss	日期时间类型格式，用于告诉连接器如何将字符串转换为日期时间，支持以下格式：`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss`
time_format	string	否	HH:mm:ss	时间类型格式，用于告诉连接器如何将字符串转换为时间，支持以下格式：`HH:mm:ss` `HH:mm:ss.SSS`
filename_extension	string	否	-	过滤文件名扩展名，用于过滤具有特定扩展名的文件。例如：`csv` `.txt` `json` `.xml`。
skip_header_row_number	long	否	0	跳过前几行，但仅适用于txt和csv。例如，设置如下：`skip_header_row_number = 2`。然后SeaTunnel将跳过源文件的前2行
csv_use_header_line	boolean	否	false	是否使用标题行来解析文件，仅在file_format为`csv`且文件包含符合RFC 4180的标题行时使用
schema	config	否	-	上游数据的schema。
sheet_name	string	否	-	读取工作簿的工作表，仅在file_format为excel时使用。
xml_row_tag	string	否	-	指定XML文件中数据行的标签名称，仅在file_format为xml时使用。
xml_use_attr_format	boolean	否	-	指定是否使用标签属性格式处理数据，仅在file_format为xml时使用。
compress_codec	string	否	none	文件使用的压缩编解码器。
encoding	string	否	UTF-8
null_format	string	否	-	仅在file_format_type为text时使用。null_format用于定义哪些字符串可以表示为null。例如：`\N`
binary_chunk_size	int	否	1024	仅在file_format_type为binary时使用。读取二进制文件的块大小（以字节为单位）。默认为1024字节。较大的值可能会提高大文件的性能，但会使用更多内存。
binary_complete_file_mode	boolean	否	false	仅在file_format_type为binary时使用。是否将完整文件作为单个块读取，而不是分割成块。启用时，整个文件内容将一次性读入内存。默认为false。
file_filter_pattern	string	否		过滤模式，用于过滤文件。
common-options	config	否	-	数据源插件通用参数，请参考数据源通用选项了解详情。
file_filter_modified_start	string	否	-	按照最后修改时间过滤文件。要过滤的开始时间(包括改时间),时间格式是：`yyyy-MM-dd HH:mm:ss`
file_filter_modified_end	string	否	-	按照最后修改时间过滤文件。要过滤的结束时间(不包括改时间),时间格式是：`yyyy-MM-dd HH:mm:ss`

compress_codec [string]

文件的压缩编解码器，支持的详细信息如下所示：

txt: lzo none
json: lzo none
csv: lzo none
orc/parquet: 自动识别压缩类型，无需额外设置。

encoding [string]

仅在file_format_type为json、text、csv、xml时使用。要读取的文件的编码。此参数将由Charset.forName(encoding)解析。

binary_chunk_size [int]

仅在file_format_type为binary时使用。

读取二进制文件的块大小（以字节为单位）。默认为1024字节。较大的值可能会提高大文件的性能，但会使用更多内存。

binary_complete_file_mode [boolean]

仅在file_format_type为binary时使用。

是否将完整文件作为单个块读取，而不是分割成块。启用时，整个文件内容将一次性读入内存。默认为false。

file_format_type [string]

文件类型，支持以下文件类型：

text csv parquet orc json excel xml binary markdown

如果您将文件类型指定为 markdown，SeaTunnel 可以解析 markdown 文件并提取结构化数据。 markdown 解析器提取各种元素，包括标题、段落、列表、代码块、表格等。每个元素都转换为具有以下架构的行：

element_id：元素的唯一标识符
element_type：元素类型（Heading、Paragraph、ListItem 等）
heading_level：标题级别（1-6，非标题元素为 null）
text：元素的文本内容
page_number：页码（默认：1）
position_index：文档中的位置索引
parent_id：父元素的 ID
child_ids：子元素 ID 的逗号分隔列表

注意：Markdown 格式仅支持读取，不支持写入。

file_filter_pattern [string]

文件过滤模式，用于过滤文件。若只想根据文件名称筛选，则直接写文件名称的正则；若同时想根据文件目录进行过滤，则表达式以path起始。

该模式遵循标准正则表达式。详情请参考 https://en.wikipedia.org/wiki/Regular_expression。以下是一些示例。

若path为/data/seatunnel,且文件结构示例：

/data/seatunnel/20241001/report.txt
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
/data/seatunnel/20241012/logo.png

匹配规则示例：

示例1：匹配所有.txt文件，正则表达式：

.*.txt

此示例匹配的结果是：

/data/seatunnel/20241001/report.txt

示例2：匹配所有以abc开头的文件，正则表达式：

abc.*

此示例匹配的结果是：

/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv

示例3：匹配20241007文件夹下所有以 abc 开头的文件，且第四个字符为 h 或 g，正则表达式：

/data/seatunnel/20241007/abc[h,g].*

此示例匹配的结果是：

/data/seatunnel/20241007/abch202410.csv

示例4：匹配以202410开头的第三级文件夹和以.csv结尾的文件，正则表达式：

/data/seatunnel/202410\d*/.*.csv

此示例匹配的结果是：

/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv

schema [config]

仅在file_format_type为text、json、excel、xml或csv时需要配置（或其他我们无法从元数据读取schema的格式）。

fields [Config]

上游数据的schema。

如何创建Oss数据同步作业

以下示例演示如何创建从Oss读取数据并在本地客户端打印的数据同步作业：

# 设置要执行的任务的基本配置
env {
  parallelism = 1
  job.mode = "BATCH"
}

# 创建连接到Oss的数据源
source {
  OssFile {
    path = "/seatunnel/orc"
    bucket = "oss://tyrantlucifer-image-bed"
    access_key = "xxxxxxxxxxxxxxxxx"
    access_secret = "xxxxxxxxxxxxxxxxxxxxxx"
    endpoint = "oss-cn-beijing.aliyuncs.com"
    file_format_type = "orc"
  }
}

# 控制台打印读取的Oss数据
sink {
  Console {
  }
}

# 设置要执行的任务的基本配置
env {
  parallelism = 1
  job.mode = "BATCH"
}

# 创建连接到Oss的数据源
source {
  OssFile {
    path = "/seatunnel/json"
    bucket = "oss://tyrantlucifer-image-bed"
    access_key = "xxxxxxxxxxxxxxxxx"
    access_secret = "xxxxxxxxxxxxxxxxxxxxxx"
    endpoint = "oss-cn-beijing.aliyuncs.com"
    file_format_type = "json"
    schema {
      fields {
        id = int
        name = string
      }
    }
  }
}

# 控制台打印读取的Oss数据
sink {
  Console {
  }
}

多表

无需配置schema文件类型，例如：orc。

env {
  parallelism = 1
  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 1
  spark.executor.memory = "1g"
  spark.master = local
  job.mode = "BATCH"
}

source {
  OssFile {
    tables_configs = [
      {
          schema = {
              table = "fake01"
          }
          bucket = "oss://whale-ops"
          access_key = "xxxxxxxxxxxxxxxxxxx"
          access_secret = "xxxxxxxxxxxxxxxxxxx"
          endpoint = "https://oss-accelerate.aliyuncs.com"
          path = "/test/seatunnel/read/orc"
          file_format_type = "orc"
      },
      {
          schema = {
              table = "fake02"
          }
          bucket = "oss://whale-ops"
          access_key = "xxxxxxxxxxxxxxxxxxx"
          access_secret = "xxxxxxxxxxxxxxxxxxx"
          endpoint = "https://oss-accelerate.aliyuncs.com"
          path = "/test/seatunnel/read/orc"
          file_format_type = "orc"
      }
    ]
    plugin_output = "fake"
  }
}

sink {
  Assert {
    rules {
        table-names = ["fake01", "fake02"]
    }
  }
}

需要配置schema文件类型，例如：json

env {
  execution.parallelism = 1
  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 1
  spark.executor.memory = "1g"
  spark.master = local
  job.mode = "BATCH"
}

source {
  OssFile {
    tables_configs = [
      {
          bucket = "oss://whale-ops"
          access_key = "xxxxxxxxxxxxxxxxxxx"
          access_secret = "xxxxxxxxxxxxxxxxxxx"
          endpoint = "https://oss-accelerate.aliyuncs.com"
          path = "/test/seatunnel/read/json"
          file_format_type = "json"
          schema = {
            table = "fake01"
            fields {
              c_map = "map<string, string>"
              c_array = "array<int>"
              c_string = string
              c_boolean = boolean
              c_tinyint = tinyint
              c_smallint = smallint
              c_int = int
              c_bigint = bigint
              c_float = float
              c_double = double
              c_bytes = bytes
              c_date = date
              c_decimal = "decimal(38, 18)"
              c_timestamp = timestamp
              c_row = {
                C_MAP = "map<string, string>"
                C_ARRAY = "array<int>"
                C_STRING = string
                C_BOOLEAN = boolean
                C_TINYINT = tinyint
                C_SMALLINT = smallint
                C_INT = int
                C_BIGINT = bigint
                C_FLOAT = float
                C_DOUBLE = double
                C_BYTES = bytes
                C_DATE = date
                C_DECIMAL = "decimal(38, 18)"
                C_TIMESTAMP = timestamp
              }
            }
          }
      },
      {
          bucket = "oss://whale-ops"
          access_key = "xxxxxxxxxxxxxxxxxxx"
          access_secret = "xxxxxxxxxxxxxxxxxxx"
          endpoint = "https://oss-accelerate.aliyuncs.com"
          path = "/test/seatunnel/read/json"
          file_format_type = "json"
          schema = {
            table = "fake02"
            fields {
              c_map = "map<string, string>"
              c_array = "array<int>"
              c_string = string
              c_boolean = boolean
              c_tinyint = tinyint
              c_smallint = smallint
              c_int = int
              c_bigint = bigint
              c_float = float
              c_double = double
              c_bytes = bytes
              c_date = date
              c_decimal = "decimal(38, 18)"
              c_timestamp = timestamp
              c_row = {
                C_MAP = "map<string, string>"
                C_ARRAY = "array<int>"
                C_STRING = string
                C_BOOLEAN = boolean
                C_TINYINT = tinyint
                C_SMALLINT = smallint
                C_INT = int
                C_BIGINT = bigint
                C_FLOAT = float
                C_DOUBLE = double
                C_BYTES = bytes
                C_DATE = date
                C_DECIMAL = "decimal(38, 18)"
                C_TIMESTAMP = timestamp
              }
            }
          }
      }
    ]
    plugin_output = "fake"
  }
}

sink {
  Assert {
    rules {
      table-names = ["fake01", "fake02"]
    }
  }
}

过滤文件

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  OssFile {
    path = "/seatunnel/orc"
    bucket = "oss://tyrantlucifer-image-bed"
    access_key = "xxxxxxxxxxxxxxxxx"
    access_secret = "xxxxxxxxxxxxxxxxxxxxxx"
    endpoint = "oss-cn-beijing.aliyuncs.com"
    file_format_type = "orc"
    // 文件示例 abcD2024.csv
    file_filter_pattern = "abc[DX]*.*"
    // 筛选最后修改日期在 20240101 和 20240105 (不包括该日期) 之间的文件
    file_filter_modified_start = "2024-01-01 00:00:00"
    file_filter_modified_end = "2024-01-05 00:00:00"
  }
}

sink {
  Console {
  }
}

变更日志

Change Log

Change	Commit	Version
[Feature][File] Add markdown parser #9714	https://github.com/apache/seatunnel/commit/8b3c07844	dev
[Improve][Connector-V2] Add customizable row delimiter support for text file processing (#9608)	https://github.com/apache/seatunnel/commit/7898e62e01	2.3.12
[Improve][Connector-V2] Support maxcompute sink writer with timestamp field type (#9234)	https://github.com/apache/seatunnel/commit/a513c495e3	2.3.12
[Doc][Connector-V2] Update save mode config for OssFileSink (#9303)	https://github.com/apache/seatunnel/commit/40097d7f3e	2.3.11
[improve] update file connectors config (#9034)	https://github.com/apache/seatunnel/commit/8041d59dc2	2.3.11
[Improve][File] Add row_delimiter options into text file sink (#9017)	https://github.com/apache/seatunnel/commit/92aa855a34	2.3.11
Revert " [improve] update localfile connector config" (#9018)	https://github.com/apache/seatunnel/commit/cdc79e13ad	2.3.10
[improve] update localfile connector config (#8765)	https://github.com/apache/seatunnel/commit/def369a85f	2.3.10
[Feature][Connector-V2] Add `filename_extension` parameter for read/write file (#8769)	https://github.com/apache/seatunnel/commit/78b23c0ef5	2.3.10
[Improve] restruct connector common options (#8634)	https://github.com/apache/seatunnel/commit/f3499a6eeb	2.3.10
[Feature][Connector-V2] Support create emtpy file when no data (#8543)	https://github.com/apache/seatunnel/commit/275db78918	2.3.10
[Feature][Connector-V2] Support single file mode in file sink (#8518)	https://github.com/apache/seatunnel/commit/e893deed50	2.3.10
[Feature][File] Support config null format for text file read (#8109)	https://github.com/apache/seatunnel/commit/2dbf02df47	2.3.9
[Improve][API] Unified tables_configs and table_list (#8100)	https://github.com/apache/seatunnel/commit/84c0b8d660	2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)	https://github.com/apache/seatunnel/commit/6b7c53d03c	2.3.9
[Improve][Connector-V2] Support read archive compress file (#7633)	https://github.com/apache/seatunnel/commit/3f98cd8a16	2.3.8
[Improve] Added OSSFileCatalog and it's factory (#7458)	https://github.com/apache/seatunnel/commit/9006a205db	2.3.8
[Improve][Connector] Add multi-table sink option check (#7360)	https://github.com/apache/seatunnel/commit/2489f6446b	2.3.7
[Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131)	https://github.com/apache/seatunnel/commit/c4ca74122c	2.3.6
[Improve][Files] Support write fixed/timestamp as int96 of parquet (#6971)	https://github.com/apache/seatunnel/commit/1a48a9c493	2.3.6
[Chore] Fix `file` spell errors (#6606)	https://github.com/apache/seatunnel/commit/2599d3b736	2.3.5
[Fix][Connector-V2] Fix connector support SPI but without no args constructor (#6551)	https://github.com/apache/seatunnel/commit/5f3c9c36a5	2.3.5
Add support for XML file type to various file connectors such as SFTP, FTP, LocalFile, HdfsFile, and more. (#6327)	https://github.com/apache/seatunnel/commit/ec533ecd9a	2.3.5
[Feature][OssFile Connector] Make Oss implement source factory and sink factory (#6062)	https://github.com/apache/seatunnel/commit/1a8e9b4554	2.3.4
[Refactor][File Connector] Put Multiple Table File API to File Base Module (#6033)	https://github.com/apache/seatunnel/commit/c324d663b4	2.3.4
[Hotfix][Oss File Connector] fix oss connector can not run bug (#6010)	https://github.com/apache/seatunnel/commit/755bc2a730	2.3.4
Support using multiple hadoop account (#5903)	https://github.com/apache/seatunnel/commit/d69d88d1aa	2.3.4
[Improve][Common] Introduce new error define rule (#5793)	https://github.com/apache/seatunnel/commit/9d1b2582b2	2.3.4
[Improve][connector-file] unifiy option between file source/sink and update document (#5680)	https://github.com/apache/seatunnel/commit/8d87cf8fc4	2.3.4
[Feature] Support `LZO` compress on File Read (#5083)	https://github.com/apache/seatunnel/commit/a4a1901096	2.3.4
[Feature][Connector-V2][File] Support read empty directory (#5591)	https://github.com/apache/seatunnel/commit/1f58f224a0	2.3.4
Support config column/primaryKey/constraintKey in schema (#5564)	https://github.com/apache/seatunnel/commit/eac76b4e50	2.3.4
[Feature][File Connector]optionrule FILE_FORMAT_TYPE is text/csv ,add parameter BaseSinkConfig.ENABLE_HEADER_WRITE: #5566 (#5567)	https://github.com/apache/seatunnel/commit/0e02db768d	2.3.4
[Feature][Connector V2][File] Add config of 'file_filter_pattern', which used for filtering files. (#5153)	https://github.com/apache/seatunnel/commit/a3c13e59eb	2.3.3
[Fix][Connector-V2] Fix file-oss config check bug and amend file-oss-jindo factoryIdentifier (#4581)	https://github.com/apache/seatunnel/commit/5c4f17df20	2.3.2
[Feature][ConnectorV2]add file excel sink and source (#4164)	https://github.com/apache/seatunnel/commit/e3b97ae5d2	2.3.2
Change file type to file_format_type in file source/sink (#4249)	https://github.com/apache/seatunnel/commit/973a2fae3c	2.3.1
Merge branch 'dev' into merge/cdc	https://github.com/apache/seatunnel/commit/4324ee1912	2.3.1
[Improve][Project] Code format with spotless plugin.	https://github.com/apache/seatunnel/commit/423b583038	2.3.1
[improve][api] Refactoring schema parse (#4157)	https://github.com/apache/seatunnel/commit/b2f573a13e	2.3.1
[Improve][build] Give the maven module a human readable name (#4114)	https://github.com/apache/seatunnel/commit/d7cd601051	2.3.1
[Improve][Project] Code format with spotless plugin. (#4101)	https://github.com/apache/seatunnel/commit/a2ab166561	2.3.1
[Feature][Connector-V2][File] Support compress (#3899)	https://github.com/apache/seatunnel/commit/55602f6b1c	2.3.1
[Feature][Connector] add get source method to all source connector (#3846)	https://github.com/apache/seatunnel/commit/417178fb84	2.3.1
[Improve][Connector-V2][File] Improve file connector option rule and document (#3812)	https://github.com/apache/seatunnel/commit/bd76077669	2.3.1
[Hotfix][OptionRule] Fix option rule about all connectors (#3592)	https://github.com/apache/seatunnel/commit/226dc6a119	2.3.0
[Improve][Connector-V2][File] Unified excetion for file source & sink connectors (#3525)	https://github.com/apache/seatunnel/commit/031e8e263c	2.3.0
[Feature][Connector-V2][File] Add option and factory for file connectors (#3375)	https://github.com/apache/seatunnel/commit/db286e8631	2.3.0
[Improve][Connector-V2][File] Improve code structure (#3238)	https://github.com/apache/seatunnel/commit/dd5c353881	2.3.0
[Connector-V2][ElasticSearch] Add ElasticSearch Source/Sink Factory (#3325)	https://github.com/apache/seatunnel/commit/38254e3f26	2.3.0
[Improve][Connector-V2][File] Support parse field from file path (#2985)	https://github.com/apache/seatunnel/commit/0bc12085c2	2.3.0-beta
[Improve][connector][file] Support user-defined schema for reading text file (#2976)	https://github.com/apache/seatunnel/commit/1c05ee0d7e	2.3.0-beta
[Improve][Connector] Improve write parquet (#2943)	https://github.com/apache/seatunnel/commit/8fd966394b	2.3.0-beta
[Fix][Connector-V2] Fix HiveSource Connector read orc table error (#2845)	https://github.com/apache/seatunnel/commit/61720306e7	2.2.0-beta
[Improve][Connector-V2] Improve read parquet (#2841)	https://github.com/apache/seatunnel/commit/e19bc82f9b	2.2.0-beta
[Feature][Connector-V2] Add oss sink (#2629)	https://github.com/apache/seatunnel/commit/bb2ad40487	2.2.0-beta
[#2606]Dependency management split (#2630)	https://github.com/apache/seatunnel/commit/fc047be69b	2.2.0-beta
[chore][connector-common] Rename SeatunnelSchema to SeaTunnelSchema (#2538)	https://github.com/apache/seatunnel/commit/7dc2a27388	2.2.0-beta
[Feature][Connector-V2] Add oss source connector (#2467)	https://github.com/apache/seatunnel/commit/712b77744e	2.2.0-beta

OssFile

支持的引擎​

使用依赖​

对于Spark/Flink引擎​

对于SeaTunnel Zeta引擎​

主要特性​

数据类型映射​

JSON文件类型​

文本或CSV文件类型​

Orc文件类型​

Parquet文件类型​

选项​

compress_codec [string]​

encoding [string]​

binary_chunk_size [int]​

binary_complete_file_mode [boolean]​

file_format_type [string]​

file_filter_pattern [string]​

schema [config]​

fields [Config]​

如何创建Oss数据同步作业​

多表​

过滤文件​

变更日志​

支持的引擎

使用依赖

对于Spark/Flink引擎

对于SeaTunnel Zeta引擎

主要特性

数据类型映射

JSON文件类型

文本或CSV文件类型

Orc文件类型

Parquet文件类型

选项

compress_codec [string]

encoding [string]

binary_chunk_size [int]

binary_complete_file_mode [boolean]

file_format_type [string]

file_filter_pattern [string]

schema [config]

fields [Config]

如何创建Oss数据同步作业

多表

过滤文件

变更日志