跳到主要内容
版本:Next

StarRocks

StarRocks 源连接器

描述

通过StarRocks读取外部数据源数据。 StarRocks源连接器的内部实现是从FE获取查询计划, 将查询计划作为参数传递给BE节点,然后从BE节点获取数据结果。

主要功能

配置选项

名称类型是否必须默认值
nodeUrlslist-
usernamestring-
passwordstring-
databasestring-
tablestring-
scan_filterstring-
schemaconfig-
table_listarray-
request_tablet_sizeintInteger.MAX_VALUE
scan_connect_timeout_msint30000
scan_query_timeout_secint3600
scan_keep_alive_minint10
scan_batch_rowsint1024
scan_mem_limitlong2147483648
max_retriesint3
scan.params.*string-

nodeUrls [list]

StarRocks 集群地址配置格式 ["fe_ip:fe_http_port", ...]

username [string]

StarRocks 用户名称。

password [string]

StarRocks 用户密码。

database [string]

StarRocks 数据库名。

table [string]

StarRocks 表名。

scan_filter [string]

过滤查询的表达式,该表达式透明地传输到StarRocksStarRocks 使用此表达式完成源端数据过滤。

例如

"tinyint_1 = 100"

schema [config]

fields [Config]

要生成的starRocksschema

示例

schema {
fields {
name = string
age = int
}
}

table_list [array]

StarRocks 表名列表,当需要同时读取多表时使用此配置代替 table

request_tablet_size [int]

与分区对应的StarRocks tablet的数量。此值设置得越小,生成的分区就越多。这将增加引擎的平行度,但同时也会给StarRocks造成更大的压力。

以下示例,用于解释如何使用request_tablet_size来控制分区的生成。

StarRocks 集群中表的 tablet 分布作为 follower

be_node_1 tablet[1, 2, 3, 4, 5]
be_node_2 tablet[6, 7, 8, 9, 10]
be_node_3 tablet[11, 12, 13, 14, 15]

1.如果没有设置 request_tablet_size,则单个分区中的 tablet 数量将没有限制。分区将按以下方式生成:

partition[0] 从 be_node_1 读取 tablet 数据:tablet[1, 2, 3, 4, 5]
partition[1] 从 be_node_2 读取 tablet 数据:tablet[6, 7, 8, 9, 10]
partition[2] 从 be_node_3 读取 tablet 数据:tablet[11, 12, 13, 14, 15]

2.如果设置了 request_tablet_size=3,则每个分区中最多包含 3 个 tablet。分区将按以下方式生成

partition[0] 从 be_node_1 读取 tablet 数据:tablet[1, 2, 3]
partition[1] 从 be_node_1 读取 tablet 数据:tablet[4, 5]
partition[2] 从 be_node_2 读取 tablet 数据:tablet[6, 7, 8]
partition[3] 从 be_node_2 读取 tablet 数据:tablet[9, 10]
partition[4] 从 be_node_3 读取 tablet 数据:tablet[11, 12, 13]
partition[5] 从 be_node_3 读取 tablet 数据:tablet[14,15]

scan_connect_timeout_ms [int]

发送到 StarRocks 的请求连接超时。

scan_query_timeout_sec [int]

StarRocks 中,查询超时时间的默认值为 1 小时,-1 表示没有超时限制。

scan_keep_alive_min [int]

查询任务的保持连接时长,单位是分钟,默认值为 10 分钟。我们建议将此参数设置为大于或等于 5 的值。

scan_batch_rows [int]

一次从 BE 节点读取的最大数据行数。增加此值可以减少引擎与 StarRocks 之间建立的连接数量,从而减轻由网络延迟引起的开销。

scan_mem_limit [long]

单个查询在 BE 节点上允许的最大内存空间,单位为字节,默认值为 2147483648 字节(即 2 GB)。

max_retries [int]

发送到 StarRocks 的重试请求次数。

scan.params. [string]

BE 节点扫描数据相关的参数。

示例 1

source {
StarRocks {
nodeUrls = ["starrocks_e2e:8030"]
username = root
password = ""
database = "test"
table = "e2e_table_source"
scan_batch_rows = 10
max_retries = 3
schema {
fields {
BIGINT_COL = BIGINT
LARGEINT_COL = STRING
SMALLINT_COL = SMALLINT
TINYINT_COL = TINYINT
BOOLEAN_COL = BOOLEAN
DECIMAL_COL = "DECIMAL(20, 1)"
DOUBLE_COL = DOUBLE
FLOAT_COL = FLOAT
INT_COL = INT
CHAR_COL = STRING
VARCHAR_11_COL = STRING
STRING_COL = STRING
DATETIME_COL = TIMESTAMP
DATE_COL = DATE
}
}
scan.params.scanner_thread_pool_thread_num = "3"

}
}

示例 2: 读取多表

source {
StarRocks {
nodeUrls = ["starrocks_e2e:8030"]
username = root
password = ""
database = "test"
table_list = [
{
table = "e2e_table_source"
schema = {
fields {
BIGINT_COL = BIGINT
LARGEINT_COL = STRING
SMALLINT_COL = SMALLINT
TINYINT_COL = TINYINT
BOOLEAN_COL = BOOLEAN
DECIMAL_COL = "DECIMAL(20, 1)"
DOUBLE_COL = DOUBLE
FLOAT_COL = FLOAT
INT_COL = INT
CHAR_COL = STRING
VARCHAR_11_COL = STRING
STRING_COL = STRING
DATETIME_COL = TIMESTAMP
DATE_COL = DATE
}
}
},
{
table = "e2e_table_source_2"
schema = {
fields {
BIGINT_COL_2 = BIGINT
LARGEINT_COL_2 = STRING
SMALLINT_COL_2 = SMALLINT
TINYINT_COL_2 = TINYINT
BOOLEAN_COL_2 = BOOLEAN
DECIMAL_COL_2 = "DECIMAL(20, 1)"
DOUBLE_COL_2 = DOUBLE
FLOAT_COL_2 = FLOAT
INT_COL_2 = INT
CHAR_COL_2 = STRING
VARCHAR_11_COL_2 = STRING
STRING_COL_2 = STRING
DATETIME_COL_2 = TIMESTAMP
DATE_COL_2 = DATE
}
}
}]
scan_batch_rows = 10
max_retries = 3
scan.params.scanner_thread_pool_thread_num = "3"

}
}

变更日志

Change Log
ChangeCommitVersion
[Improve][Connector-V2] Random pick the starrocks fe address which can be connected (#8898)https://github.com/apache/seatunnel/commit/bef76078f9dev
[Feature][Connector-v2] Support multi starrocks source (#8789)https://github.com/apache/seatunnel/commit/26b5529aafdev
[Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS (#8768)https://github.com/apache/seatunnel/commit/3c6f216135dev
[Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector (#8778)https://github.com/apache/seatunnel/commit/96b610eb7edev
[Improve] restruct connector common options (#8634)https://github.com/apache/seatunnel/commit/f3499a6eebdev
[improve] add StarRocks options (#8639)https://github.com/apache/seatunnel/commit/da8d9cbd35dev
[Fix][Connector-V2] fix starRocks automatically creates tables with comment (#8568)https://github.com/apache/seatunnel/commit/c4cb1fc4a3dev
[Fix][Connector-V2] Fixed adding table comments (#8514)https://github.com/apache/seatunnel/commit/edca75b0d6dev
[Feature][Connector-V2] Starrocks implements multi table sink (#8467)https://github.com/apache/seatunnel/commit/55eebfa8af2.3.9
[Improve][Connector-V2] Add pre-check starrocks version before exeucte alter table field name (#8237)https://github.com/apache/seatunnel/commit/c24e3b12ba2.3.9
[Fix][Connector-starrocks] Fix drop column bug for starrocks (#8216)https://github.com/apache/seatunnel/commit/082814da1f2.3.9
[Feature][Core] Support read arrow data (#8137)https://github.com/apache/seatunnel/commit/4710ea0f8d2.3.9
[Feature][Clickhouse] Support sink savemode (#8086)https://github.com/apache/seatunnel/commit/e6f92fd79b2.3.9
[Feature][Connector-V2] StarRocks-sink support schema evolution (#8082)https://github.com/apache/seatunnel/commit/d33b0da8ab2.3.9
[Improve][dist]add shade check rule (#8136)https://github.com/apache/seatunnel/commit/51ef8000162.3.9
[Improve][Connector-V2] Add doris/starrocks create table with comment (#7847)https://github.com/apache/seatunnel/commit/207b8c16fd2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)https://github.com/apache/seatunnel/commit/6b7c53d03c2.3.9
[Improve][API] Move catalog open to SaveModeHandler (#7439)https://github.com/apache/seatunnel/commit/8c2c5c79a12.3.8
[Improve][Connector-V2] Reuse connection in StarRocksCatalog (#7342)https://github.com/apache/seatunnel/commit/8ee129d20f2.3.8
[Improve][Connector-V2] Remove system table limit (#7391)https://github.com/apache/seatunnel/commit/adf888e0082.3.8
[Improve][Connector-V2] Close all ResultSet after used (#7389)https://github.com/apache/seatunnel/commit/853e9732122.3.8
[Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131)https://github.com/apache/seatunnel/commit/c4ca74122c2.3.6
[Fix][Connector-V2] Fix starrocks Content-Length header already present error (#7034)https://github.com/apache/seatunnel/commit/a485a74eff2.3.6
[Feature][Connector-V2]Support StarRocks Fe Node HAhttps://github.com/apache/seatunnel/commit/9c36c458192.3.6
[Fix][Connector-v2] Fix the sql statement error of create table for doris and starrocks (#6679)https://github.com/apache/seatunnel/commit/88263cd69f2.3.6
[Fix][StarRocks] Fix NPE when upstream catalogtable table path only have table name part (#6540)https://github.com/apache/seatunnel/commit/5795b265cc2.3.5
[Fix][Connector-V2] Fixed doris/starrocks create table sql parse error (#6580)https://github.com/apache/seatunnel/commit/f2ed1fbde02.3.5
[Fix][Connector-V2] Fix connector support SPI but without no args constructor (#6551)https://github.com/apache/seatunnel/commit/5f3c9c36a52.3.5
[Improve] Add SaveMode log of process detail (#6375)https://github.com/apache/seatunnel/commit/b0d70ce2242.3.5
[Improve][Connector-V2] Support TableSourceFactory on StarRocks (#6498)https://github.com/apache/seatunnel/commit/aded56299c2.3.5
[Improve] StarRocksSourceReader use the existing client (#6480)https://github.com/apache/seatunnel/commit/1a02c571a92.3.5
[Improve][API] Unify type system api(data & type) (#5872)https://github.com/apache/seatunnel/commit/b38c7edcc92.3.5
[Feature][Connector] add starrocks save_mode (#6029)https://github.com/apache/seatunnel/commit/66b0f1e1d22.3.4
[Feature] Add unsupported datatype check for all catalog (#5890)https://github.com/apache/seatunnel/commit/b9791285a02.3.4
[Improve] StarRocks support create table template with unique key (#5905)https://github.com/apache/seatunnel/commit/25b01125e42.3.4
[Improve][StarRocksSink] add http socket timeout. (#5918)https://github.com/apache/seatunnel/commit/febdb262b62.3.4
[Improve] Support create varchar field type in StarRocks (#5911)https://github.com/apache/seatunnel/commit/60258951672.3.4
[Improve]Change System.out.println to log output. (#5912)https://github.com/apache/seatunnel/commit/bbedb07a9c2.3.4
[Improve][Common] Introduce new error define rule (#5793)https://github.com/apache/seatunnel/commit/9d1b2582b22.3.4
[Improve] Remove use SeaTunnelSink::getConsumedType method and mark it as deprecated (#5755)https://github.com/apache/seatunnel/commit/8de74081002.3.4
[Improve][Connector] Add field name to DataTypeConvertor to improve error message (#5782)https://github.com/apache/seatunnel/commit/ab60790f0d2.3.4
[feature][connector-jdbc]Add Save Mode function and Connector-JDBC (MySQL) connector has been realized (#5663)https://github.com/apache/seatunnel/commit/eff17ccbe52.3.4
[Improve] Add default implement for SeaTunnelSink::setTypeInfo (#5682)https://github.com/apache/seatunnel/commit/86cba874502.3.4
Support config column/primaryKey/constraintKey in schema (#5564)https://github.com/apache/seatunnel/commit/eac76b4e502.3.4
[Improve] Refactor CatalogTable and add SeaTunnelSource::getProducedCatalogTables (#5562)https://github.com/apache/seatunnel/commit/41173357f82.3.4
[Hotfix][Connector-V2][StarRocks] fix starrocks template sql parser #5071 (#5332)https://github.com/apache/seatunnel/commit/23d79b0d172.3.4
[Improve][Connector-V2] Remove scheduler in StarRocks sink (#5269)https://github.com/apache/seatunnel/commit/cb7b7949142.3.4
[Improve][CheckStyle] Remove useless 'SuppressWarnings' annotation of checkstyle. (#5260)https://github.com/apache/seatunnel/commit/51c0d709ba2.3.4
[Hotfix] Fix com.google.common.base.Preconditions to seatunnel shade one (#5284)https://github.com/apache/seatunnel/commit/ed5eadcf732.3.3
Fix StarRocksJsonSerializer will transform array/map/row to string (#5281)https://github.com/apache/seatunnel/commit/f9419537742.3.3
[Improve] Improve savemode api (#4767)https://github.com/apache/seatunnel/commit/4acd370d482.3.3
[Improve][Connector-V2] Improve StarRocks Auto Create Table To Support Use Primary Key Template In Field (#4487)https://github.com/apache/seatunnel/commit/e601cd4c372.3.2
Revert "[Improve][Catalog] refactor catalog (#4540)" (#4628)https://github.com/apache/seatunnel/commit/2d1933195d2.3.2
[hotfix][starrocks] fix error on get starrocks source typeInfo (#4619)https://github.com/apache/seatunnel/commit/f7b094f9eb2.3.2
[Improve][Catalog] refactor catalog (#4540)https://github.com/apache/seatunnel/commit/b0a701cb832.3.2
[Improve][Connector-V2] Throw StarRocks Serialize Error To Client (#4484)https://github.com/apache/seatunnel/commit/e2c107323b2.3.2
[Improve][Connector-V2] Improve StarRocks Serialize Error Message (#4458)https://github.com/apache/seatunnel/commit/465e75cbf52.3.2
[Hotfix][Zeta] Adapt StarRocks With Multi-Table And Single-Table Mode (#4324)https://github.com/apache/seatunnel/commit/c11c171d362.3.1
[improve][zeta] fix zeta bugshttps://github.com/apache/seatunnel/commit/3a82e8b39f2.3.1
[Improve][Zeta] Improve Client Job Info Messagehttps://github.com/apache/seatunnel/commit/56febf01182.3.1
[Fix][Connector-V2] Fix StarRocksSink Without Format Field In Headerhttps://github.com/apache/seatunnel/commit/463ae6437e2.3.1
[Improve] Support StarRocksCatalog Use JDBC URL With Custom Suffixhttps://github.com/apache/seatunnel/commit/d00ced6ecd2.3.1
[Improve] Support MySqlCatalog Use JDBC URL With Custom Suffixhttps://github.com/apache/seatunnel/commit/210d0ff1f82.3.1
[Improve] Change StarRocks Sink Default Format To Jsonhttps://github.com/apache/seatunnel/commit/87033578302.3.1
[Fix] Fix StarRocks Default Url Can't Usehttps://github.com/apache/seatunnel/commit/67c45d353a2.3.1
[hotfix] fixed schema options import errorhttps://github.com/apache/seatunnel/commit/656805f2df2.3.1
[chore] Code format with spotless plugin.https://github.com/apache/seatunnel/commit/291214ad6f2.3.1
Merge branch 'dev' into merge/cdchttps://github.com/apache/seatunnel/commit/4324ee19122.3.1
[Improve][Project] Code format with spotless plugin.https://github.com/apache/seatunnel/commit/423b5830382.3.1
[Fix] Fix StarRocks Default Url Can't Use (#4229)https://github.com/apache/seatunnel/commit/ed74d110902.3.1
[Bug] Remove StarRocks Auto Creat Table Default Value (#4220)https://github.com/apache/seatunnel/commit/80b5cd40ae2.3.1
[Feature] Add SaveMode For StarRocks (#4217)https://github.com/apache/seatunnel/commit/0674f10a532.3.1
[Improve] Improve StarRocks Catalog Base Url (#4215)https://github.com/apache/seatunnel/commit/6632a404732.3.1
[Improve] Improve StarRocks Sink Config (#4212)https://github.com/apache/seatunnel/commit/8d5712c1db2.3.1
[Hotfix][Zeta] keep deleteCheckpoint method synchronized (#4209)https://github.com/apache/seatunnel/commit/061f9b58722.3.1
[Improve] Improve StarRocks Auto Create Table (#4208)https://github.com/apache/seatunnel/commit/bc9cd6bf692.3.1
[hotfix][zeta] fix zeta multi-table parser error (#4193)https://github.com/apache/seatunnel/commit/98f2ad0c192.3.1
[feature][starrocks] add StarRocks factories (#4191)https://github.com/apache/seatunnel/commit/c485d887ec2.3.1
[Feature] Change StarRocks CreatTable Template (#4184)https://github.com/apache/seatunnel/commit/4cf07f3beb2.3.1
[Feature][Connector-V2] StarRocks source connector (#3679)https://github.com/apache/seatunnel/commit/9681173b102.3.1
[Improve][Connector-V2] [StarRocks] Starrocks Support Auto Create Table (#4177)https://github.com/apache/seatunnel/commit/7e0008e6fb2.3.1
[Improve][build] Give the maven module a human readable name (#4114)https://github.com/apache/seatunnel/commit/d7cd6010512.3.1
[Improve][Project] Code format with spotless plugin. (#4101)https://github.com/apache/seatunnel/commit/a2ab1665612.3.1
[Feature][Connector-v2][StarRocks] Support write cdc changelog event(INSERT/UPDATE/DELETE) (#3865)https://github.com/apache/seatunnel/commit/8e3d158c032.3.1
[Improve][Connector-V2] Change Connector Custom Config Prefix To Map (#3719)https://github.com/apache/seatunnel/commit/ef1b8b1bb52.3.1
[Improve][Connector-V2][StarRocks] Unified exception for StarRocks source and sink (#3593)https://github.com/apache/seatunnel/commit/612d0297a02.3.0
[Improve][Connector-V2][StarRocks] Delete the Mapper may not be used (#3579)https://github.com/apache/seatunnel/commit/1e868ecf282.3.0
[Hotfix][OptionRule] Fix option rule about all connectors (#3592)https://github.com/apache/seatunnel/commit/226dc6a1192.3.0
[Improve][Connector-V2][StarRocks]Add StarRocks connector option rules (#3402)https://github.com/apache/seatunnel/commit/5d187f69b72.3.0
[Bugfix][Connector-V2][StarRocks]Fix StarRocks StreamLoad retry bug and fix doc (#3406)https://github.com/apache/seatunnel/commit/071f9aa0552.3.0
[Feature][Connector-V2] Starrocks sink connector (#3164)https://github.com/apache/seatunnel/commit/3e6caf70532.3.0