Clickhouse
Clickhouse sink connector
Description
Used to write data to Clickhouse.
Key features
The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.
Options
| name | type | required | default value | 
|---|---|---|---|
| host | string | yes | - | 
| database | string | yes | - | 
| table | string | yes | - | 
| username | string | yes | - | 
| password | string | yes | - | 
| clickhouse.config | map | no | |
| bulk_size | string | no | 20000 | 
| split_mode | string | no | false | 
| sharding_key | string | no | - | 
| primary_key | string | no | - | 
| support_upsert | boolean | no | false | 
| allow_experimental_lightweight_delete | boolean | no | false | 
| common-options | no | - | 
host [string]
ClickHouse cluster address, the format is host:port , allowing multiple hosts to be specified. Such as "host1:8123,host2:8123" .
database [string]
The ClickHouse database
table [string]
The table name
username [string]
ClickHouse user username
password [string]
ClickHouse user password
clickhouse.config [map]
In addition to the above mandatory parameters that must be specified by clickhouse-jdbc , users can also specify multiple optional parameters, which cover all the parameters provided by clickhouse-jdbc .
bulk_size [number]
The number of rows written through Clickhouse-jdbc each time, the default is 20000 .
split_mode [boolean]
This mode only support clickhouse table which engine is 'Distributed'.And internal_replication option
should be true. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be
counted.
sharding_key [string]
When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true.
primary_key [string]
Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table
support_upsert [boolean]
Support upsert row by query primary key
allow_experimental_lightweight_delete [boolean]
Allow experimental lightweight delete based on *MergeTree table engine
common options
Sink plugin common parameters, please refer to Sink Common Options for details
Examples
Simple
sink {
  Clickhouse {
    host = "localhost:8123"
    database = "default"
    table = "fake_all"
    username = "default"
    password = ""
    clickhouse.confg = {
      max_rows_to_read = "100"
      read_overflow_mode = "throw"
    }
  }
}
Split mode
sink {
  Clickhouse {
    host = "localhost:8123"
    database = "default"
    table = "fake_all"
    username = "default"
    password = ""
    
    # split mode options
    split_mode = true
    sharding_key = "age"
  }
}
CDC(Change data capture)
sink {
  Clickhouse {
    host = "localhost:8123"
    database = "default"
    table = "fake_all"
    username = "default"
    password = ""
    
    # cdc options
    primary_key = "id"
    support_upsert = true
  }
}
CDC(Change data capture) for *MergeTree engine
sink {
  Clickhouse {
    host = "localhost:8123"
    database = "default"
    table = "fake_all"
    username = "default"
    password = ""
    
    # cdc options
    primary_key = "id"
    support_upsert = true
    allow_experimental_lightweight_delete = true
  }
}
Changelog
2.2.0-beta 2022-09-26
- Add ClickHouse Sink Connector
 
2.3.0-beta 2022-10-20
- [Improve] Clickhouse Support Int128,Int256 Type (3067)