Skip to main content
Version: Next

Clickhouse

Clickhouse source connector

Support Those Engines​

Spark
Flink
SeaTunnel Zeta

Key Features​

supports query SQL and can achieve projection effect.

Description​

Used to read data from Clickhouse.

Supported DataSource Info​

In order to use the Clickhouse connector, the following dependencies are required. They can be downloaded via install-plugin.sh or from the Maven central repository.

DatasourceSupported VersionsDependency
ClickhouseuniversalDownload

Data Type Mapping​

Clickhouse Data TypeSeaTunnel Data Type
String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon MultiPolygonSTRING
Int8 / UInt8 / Int16 / UInt16 / Int32INT
UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecondBIGINT
Float64DOUBLE
DecimalDECIMAL
Float32FLOAT
DateDATE
DateTimeTIME
ArrayARRAY
MapMAP

Source Options​

NameTypeRequiredDefaultDescription
hostStringYes-ClickHouse cluster address, the format is host:port , allowing multiple hosts to be specified. Such as "host1:8123,host2:8123" .
databaseStringYes-The ClickHouse database.
sqlStringYes-The query sql used to search data though Clickhouse server.
usernameStringYes-ClickHouse user username.
passwordStringYes-ClickHouse user password.
clickhouse.configMapNo-In addition to the above mandatory parameters that must be specified by clickhouse-jdbc , users can also specify multiple optional parameters, which cover all the parameters provided by clickhouse-jdbc.
server_time_zoneStringNoZoneId.systemDefault()The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone.
common-optionsNo-Source plugin common parameters, please refer to Source Common Options for details.

How to Create a Clickhouse Data Synchronization Jobs​

The following example demonstrates how to create a data synchronization job that reads data from Clickhouse and prints it on the local client:

# Set the basic configuration of the task to be performed
env {
parallelism = 10
job.mode = "BATCH"
}

# Create a source to connect to Clickhouse
source {
Clickhouse {
host = "localhost:8123"
database = "default"
sql = "select * from test where age = 20 limit 100"
username = "xxxxx"
password = "xxxxx"
server_time_zone = "UTC"
result_table_name = "test"
clickhouse.config = {
"socket_timeout": "300000"
}
}
}

# Console printing of the read Clickhouse data
sink {
Console {
parallelism = 1
}
}

Tips​

1.SeaTunnel Deployment Document.