Hbase
Hbase 源连接器
描述
从 Apache Hbase 读取数据。
主要功能
选项
名称 | 类型 | 必填 | 默认值 |
---|---|---|---|
zookeeper_quorum | string | 是 | - |
table | string | 是 | - |
schema | config | 是 | - |
hbase_extra_config | string | 否 | - |
caching | int | 否 | -1 |
batch | int | 否 | -1 |
cache_blocks | boolean | 否 | false |
common-options | 否 | - |
zookeeper_quorum [string]
hbase的zookeeper集群主机,例如:“hadoop001:2181,hadoop002:2181,hadoop003:2181”
table [string]
要写入的表名,例如:“seatunnel”
schema [config]
Hbase 使用字节数组进行存储。因此,您需要为表中的每一列配置数据类型。有关更多信息,请参阅:guide。
hbase_extra_config [config]
hbase 的额外配置
caching
caching 参数用于设置在扫描过程中一次从服务器端获取的行数。这可以减少客户端与服务器之间的往返次数,从而提高扫描效率。默认值:-1
batch
batch 参数用于设置在扫描过程中每次返回的最大列数。这对于处理有很多列的行特别有用,可以避免一次性返回过多数据,从而节省内存并提高性能。
cache_blocks
cache_blocks 参数用于设置在扫描过程中是否缓存数据块。默认情况下,HBase 会在扫描时将数据块缓存到块缓存中。如果设置为 false,则在扫描过程中不会缓存数据块,从而减少内存的使用。在SeaTunnel中默认值为: false
常用选项
Source 插件常用参数,具体请参考 Source 常用选项
示例
source {
Hbase {
zookeeper_quorum = "hadoop001:2181,hadoop002:2181,hadoop003:2181"
table = "seatunnel_test"
caching = 1000
batch = 100
cache_blocks = false
schema = {
columns = [
{
name = "rowkey"
type = string
},
{
name = "columnFamily1:column1"
type = boolean
},
{
name = "columnFamily1:column2"
type = double
},
{
name = "columnFamily2:column1"
type = bigint
}
]
}
}
}
变更日志
Change Log
Change | Commit | Version |
---|---|---|
[Improve] hbase options (#8923) | https://github.com/apache/seatunnel/commit/b6a702b58f | dev |
[Improve] restruct connector common options (#8634) | https://github.com/apache/seatunnel/commit/f3499a6eeb | dev |
[Improve][dist]add shade check rule (#8136) | https://github.com/apache/seatunnel/commit/51ef800016 | 2.3.9 |
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786) | https://github.com/apache/seatunnel/commit/6b7c53d03c | 2.3.9 |
[Fix][Connector-V2] Fix known directory create and delete ignore issues (#7700) | https://github.com/apache/seatunnel/commit/e2fb679577 | 2.3.8 |
[Feature][Connector-V2][Hbase] implement hbase catalog (#7516) | https://github.com/apache/seatunnel/commit/b978792cb1 | 2.3.8 |
[Feature][Connector-V2] Support multi-table sink feature for HBase (#7169) | https://github.com/apache/seatunnel/commit/025fa3bb88 | 2.3.8 |
[hotfix][connector-v2-hbase]fix and optimize hbase source problem (#7148) | https://github.com/apache/seatunnel/commit/34a6b8e9f6 | 2.3.7 |
[Improve][hbase] The specified column is written to the specified column family (#5234) | https://github.com/apache/seatunnel/commit/49d397c61d | 2.3.6 |
[feature][connector-v2-hbase-sink] Support Connector v2 HBase sink TTL data writing (#7116) | https://github.com/apache/seatunnel/commit/adafd80255 | 2.3.6 |
[E2E][HBase]Refactor hbase e2e (#6859) | https://github.com/apache/seatunnel/commit/1da9bd6ce4 | 2.3.6 |
[Connector]Add hbase source connector (#6348) | https://github.com/apache/seatunnel/commit/f108a5e658 | 2.3.6 |
[Feature][HbaseSink]support array data. (#6100) | https://github.com/apache/seatunnel/commit/b592014766 | 2.3.4 |
[Improve][Common] Introduce new error define rule (#5793) | https://github.com/apache/seatunnel/commit/9d1b2582b2 | 2.3.4 |
[Improve] Remove use SeaTunnelSink::getConsumedType method and mark it as deprecated (#5755) | https://github.com/apache/seatunnel/commit/8de7408100 | 2.3.4 |
[Hotfix][Connector-v2][HbaseSink]Fix default timestamp (#4958) | https://github.com/apache/seatunnel/commit/3d8f3bf902 | 2.3.3 |
[Improve][build] Give the maven module a human readable name (#4114) | https://github.com/apache/seatunnel/commit/d7cd601051 | 2.3.1 |
[Improve][Project] Code format with spotless plugin. (#4101) | https://github.com/apache/seatunnel/commit/a2ab166561 | 2.3.1 |
[Feature][Connector-V2][Hbase] Introduce hbase sink connector (#4049) | https://github.com/apache/seatunnel/commit/68bda94a4c | 2.3.1 |