跳到主要内容

版本：Next

ObsFile

Obs 文件源连接器

支持这些引擎

Spark
Flink
Seatunnel Zeta

关键特性

描述

从华为云 OBS 文件系统读取数据。

如果您使用 spark/flink，为了使用此连接器，您必须确保您的 spark/flink 集群已集成 hadoop。测试的 hadoop 版本是 2.x。

如果您使用 SeaTunnel 引擎，它会在您下载和安装 SeaTunnel 引擎时自动集成 hadoop jar。您可以检查 ${SEATUNNEL_HOME}/lib 下的 jar 包来确认这一点。

我们为了支持更多文件类型做了一些权衡，所以我们使用 HDFS 协议来内部访问 OBS，此连接器需要一些 hadoop 依赖项。它仅支持 hadoop 版本 2.9.X+。

必需的 Jar 列表

jar	支持的版本	maven
hadoop-huaweicloud	支持版本 >= 3.1.1.29	下载
esdk-obs-java	支持版本 >= 3.19.7.3	下载
okhttp	支持版本 >= 3.11.0	下载
okio	支持版本 >= 1.14.0	下载

请下载对应 'Maven' 的支持列表，并将其复制到 '$SEATUNNEL_HOME/plugins/jdbc/lib/' 工作目录。
并将所有 jar 复制到 $SEATUNNEL_HOME/lib/

选项

参数名	类型	必须	默认值	描述
path	string	是	-	目标目录路径
file_format_type	string	是	-	文件类型
bucket	string	是	-	OBS 文件系统的桶地址，例如：`obs://obs-bucket-name`
access_key	string	是	-	OBS 文件系统的访问密钥
access_secret	string	是	-	OBS 文件系统的访问密钥
endpoint	string	是	-	OBS 文件系统的端点
read_columns	list	是	-	数据源的读取列列表
delimiter	string	否	\001	字段分隔符
row_delimiter	string	否	\n	行分隔符
parse_partition_from_path	boolean	否	true	控制是否从文件路径解析分区键和值
skip_header_row_number	long	否	0	跳过前几行，但仅适用于 txt 和 csv。
date_format	string	否	yyyy-MM-dd	日期类型格式
datetime_format	string	否	yyyy-MM-dd HH:mm:ss	日期时间类型格式
time_format	string	否	HH:mm:ss	时间类型格式

变更日志

Change Log

Change	Commit	Version
[Feature][File] Add markdown parser #9714	https://github.com/apache/seatunnel/commit/8b3c07844	dev
[Improve][Connector-V2] Add customizable row delimiter support for text file processing (#9608)	https://github.com/apache/seatunnel/commit/7898e62e01	2.3.12
[Improve][Connector-V2] Support maxcompute sink writer with timestamp field type (#9234)	https://github.com/apache/seatunnel/commit/a513c495e3	2.3.12
[improve] update file connectors config (#9034)	https://github.com/apache/seatunnel/commit/8041d59dc2	2.3.11
[Improve][File] Add row_delimiter options into text file sink (#9017)	https://github.com/apache/seatunnel/commit/92aa855a34	2.3.11
Revert " [improve] update localfile connector config" (#9018)	https://github.com/apache/seatunnel/commit/cdc79e13ad	2.3.10
[improve] update localfile connector config (#8765)	https://github.com/apache/seatunnel/commit/def369a85f	2.3.10
[Feature][Connector-V2] Add `filename_extension` parameter for read/write file (#8769)	https://github.com/apache/seatunnel/commit/78b23c0ef5	2.3.10
[Improve] restruct connector common options (#8634)	https://github.com/apache/seatunnel/commit/f3499a6eeb	2.3.10
[Feature][File] Support config null format for text file read (#8109)	https://github.com/apache/seatunnel/commit/2dbf02df47	2.3.9
[Improve][Connector-V2] Change File Read/WriteStrategy `setSeaTunnelRowTypeInfo` to `setCatalogTable` (#7829)	https://github.com/apache/seatunnel/commit/6b5f74e524	2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)	https://github.com/apache/seatunnel/commit/6b7c53d03c	2.3.9
[Feature][Connector-V2] Add Huawei Cloud OBS connector (#4578)	https://github.com/apache/seatunnel/commit/d266f4db64	2.3.6

支持这些引擎
关键特性
描述
必需的 Jar 列表
选项
变更日志