Skip to main content
v2.3.13 live / Apache Top-Level Project

The open source
data integration
tool.

Connect about 200 sources and sinks with a single config file. Run on Flink, Spark, or the native Zeta engine for batch and streaming workloads.

Zeta EngineApache FlinkApache SparkCDC / Exactly-OnceSchema Evolution
Sources
MySQL CDC
PostgreSQL
Apache Kafka
Amazon S3
MongoDB
+195 more sources
In
SeaTunnel
SeaTunnel
EtLT / Route
Out
Sinks
ClickHouse
Apache Iceberg
Elasticsearch
Apache Doris
Apache Paimon
+195 more sinks
9.4k+
GitHub Stars
~200
Data Connectors
3
Execution Engines
2.3k+
GitHub Forks

Architecture

Not ETL. EtLT.

SeaTunnel handles extract, lightweight transform, and load. Your warehouse or lakehouse can keep the heavy downstream transformation while SeaTunnel moves reliable, structured data between systems.

01 Upstream Sources
02 EtLT Engine
03 Downstream Targets
SeaTunnel
SeaTunnel
EtLT
Zeta Engine / Apache Flink / Apache SparkCDC / Exactly-Once / Schema Evolution
DatabaseOLTP / CDC
MySQLPostgreSQLOracleSQL ServerTiDB
>
>
Data WarehouseOLAP
ClickHouseApache DorisStarRocksSnowflake
StreamingPub/Sub
Apache KafkaApache PulsarRocketMQRabbitMQ
>
>
LakehouseOpen Tables
Apache IcebergApache HudiApache PaimonDelta Lake
Data LakeObject Storage
Amazon S3Alibaba OSSHDFSLocalFile
>
>
AI & AnalyticsVector / LLM
MilvusQdrantElasticsearchTypesense

Why SeaTunnel

Production-grade from day one.

Real fault tolerance, schema evolution, and multi-engine scale for production pipelines.

01

Schema changes? Handled automatically.

SeaTunnel detects upstream schema changes and propagates them downstream in real time, so teams do not need to pause the pipeline for every column change.

# Before -> After (detected automatically)

-- v1 schema -------------------------
id       BIGINT
name     VARCHAR(255)

-- v2 schema (auto-propagated) ------
id       BIGINT
name     VARCHAR(255)
email    VARCHAR(512)    NEW
phone    VARCHAR(32)     ADDED
02

Real-Time CDC

Capture INSERT, UPDATE, and DELETE events from MySQL, Oracle, PostgreSQL, SQL Server, MongoDB, and more with low latency.

03

Exactly-Once Semantics

Checkpoint-backed fault tolerance keeps records consistent across failures, retries, and restarts.

04

Multi-Engine Support

Write one pipeline definition and run it on Zeta, Flink, or Spark without rewriting connector logic.

~200 Native Connectors

If your data lives there,
SeaTunnel connects to it.

Native connectors across databases, streams, lakehouses, search systems, and object stores.

OLTP Databases
MySQL / MySQL CDC
PostgreSQL
Oracle
SQL Server
TiDB / MariaDB
+25 via JDBC
Streaming & Messaging
Apache Kafka
Apache Pulsar
RabbitMQ
RocketMQ
AWS SQS
ActiveMQ
OLAP & Analytics
ClickHouse
Apache Doris
StarRocks
Snowflake
Amazon Redshift
Cloudberry
Data Lakes & Storage
Amazon S3 / Hudi
Alibaba OSS
HDFS / LocalFile
Apache Iceberg
Delta Lake
Apache Paimon
MySQLPostgreSQLOracleSQL ServerTiDBMariaDBMongoDBDynamoDBCassandraHBaseNeo4jDB2GreenplumOceanBaseApache KafkaApache PulsarRabbitMQRocketMQActiveMQAWS SQSElasticsearchApache DruidTypesenseClickHouseApache DorisStarRocksSnowflakeCloudberryAmazon RedshiftAmazon S3HDFSAlibaba OSSLocalFileApache IcebergApache HudiDelta LakeApache PaimonApache KuduApache HiveInfluxDBApache IoTDBTDengineRedisAerospikeFTPSFTPHTTPGraphQLGoogle SheetsGoogle FirestoreSlackDingTalkFeishuEmailMaxComputeTableStoreSelectDB CloudMilvusQdrantLanceApache FlussHugeGraphPrometheusSLSSentrySensorsDataWeb3jMySQLPostgreSQLOracleSQL ServerTiDBMariaDBMongoDBDynamoDBCassandraHBaseNeo4jDB2GreenplumOceanBaseApache KafkaApache PulsarRabbitMQRocketMQActiveMQAWS SQSElasticsearchApache DruidTypesenseClickHouseApache DorisStarRocksSnowflakeCloudberryAmazon RedshiftAmazon S3HDFSAlibaba OSSLocalFileApache IcebergApache HudiDelta LakeApache PaimonApache KuduApache HiveInfluxDBApache IoTDBTDengineRedisAerospikeFTPSFTPHTTPGraphQLGoogle SheetsGoogle FirestoreSlackDingTalkFeishuEmailMaxComputeTableStoreSelectDB CloudMilvusQdrantLanceApache FlussHugeGraphPrometheusSLSSentrySensorsDataWeb3j
Also: MongoDB / Redis / Elasticsearch / Neo4j / Cassandra / HBase / Druid / HugeGraph / IoTDB / InfluxDB / DynamoDB / Milvus / Qdrant ...

How it works

Extract. transform. Load. Transform.
Elegantly simple.

One config file defines the full EtLT pipeline, from ingestion through lightweight transformation and delivery.

01 Extract

Read from anywhere

MySQL-CDCKafkaPostgreSQLMongoDBOracleS3Hive+193 more
02 Lightweight transform

Shape and enrich in-flight

SQL TransformFieldMapperCopyReplaceFieldEncryptSplitFilterRowKindJsonpath
03 Load

Deliver anywhere

ClickHouseIcebergDorisStarRocksPaimonKafkaRedis+193 more

Simple by design

A config file.
That's all it takes.

Declare your source, transform, and sink in plain config, then deploy on any supported engine without rewriting the pipeline.

mysql-cdc-to-clickhouse.conf

# Real-time: MySQL CDC -> ClickHouse

env {
  parallelism = 4
  job.mode = "STREAMING"
  checkpoint.interval = 10000
}

source {
  MySQL-CDC {
    hostname = "db.prod.internal"
    username = "reader"
    password = "${DB_PASS}"
    database-names = ["orders"]
    table-names = ["orders.events"]
    base-url = "jdbc:mysql://db.prod.internal:3306"
  }
}

transform {
  Sql {
    query = """
      SELECT *, NOW() AS synced_at
      FROM events WHERE status != 'deleted'
    """
  }
}

sink {
  ClickHouse {
    host = "ch.analytics:8123"
    database = "analytics"
    table = events_realtime
    primary_key = ["id"]
  }
}

Start moving your data
without limits.

Open source. Apache licensed. Free forever. Built for production teams that need a reliable integration layer instead of another fragile demo.