Skip to main content
Version: Next

Architecture

Overview

SeaTunnel is a distributed data integration platform with a pluggable architecture. It decouples the connector layer from the execution engine, allowing the same connectors to run on different engines.

┌─────────────────────────────────────────────────────────────┐
│ Job Configuration │
│ (HOCON / SQL / Web UI) │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ SeaTunnel Core │
│ (Job Parser, Coordinator, Scheduler) │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Source │────▶│ Transform │────▶│ Sink │
│ Connectors │ │ (Optional) │ │ Connectors │
└───────────────┘ └───────────────┘ └───────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Execution Engine │
│ SeaTunnel Engine (Zeta) / Flink / Spark │
└─────────────────────────────────────────────────────────────┘

Core Components

1. Connector API

Engine-independent API for developing Source, Transform, and Sink connectors.

ComponentDescription
SourceReads data from external systems (databases, files, message queues)
TransformPerforms data transformations (field mapping, filtering, type conversion)
SinkWrites data to target systems

2. Execution Engines

EngineBest For
SeaTunnel Engine (Zeta)Data synchronization, CDC, low resource usage
Apache FlinkComplex stream processing, existing Flink infrastructure
Apache SparkLarge-scale batch processing, existing Spark infrastructure

3. Translation Layer

Translates SeaTunnel's unified API to engine-specific implementations, enabling connector reuse across engines.

Data Flow

Source ──▶ [Split] ──▶ Reader ──▶ Transform ──▶ Writer ──▶ Sink
│ │ │
│ ▼ │
│ Checkpoint/State │
│ │ │
└───────────────────────┴────────────────────────┘
Fault Tolerance

Key Features:

  • Parallel reading with split-based distribution
  • Exactly-once semantics via distributed snapshots
  • Automatic failover and recovery

Module Structure

seatunnel/
├── seatunnel-api/ # Core API definitions
├── seatunnel-connectors-v2/ # Source & Sink connectors
├── seatunnel-transforms-v2/ # Transform plugins
├── seatunnel-engine/ # SeaTunnel Engine (Zeta)
├── seatunnel-translation/ # Engine adapters (Flink/Spark)
├── seatunnel-core/ # Job submission & CLI
├── seatunnel-formats/ # Data format handlers
└── seatunnel-e2e/ # End-to-end tests

Job Execution Flow

  1. Parse - Read and validate job configuration
  2. Plan - Generate execution plan with parallelism
  3. Schedule - Distribute tasks to workers
  4. Execute - Run Source → Transform → Sink pipeline
  5. Monitor - Track progress, metrics, and checkpoints

Next Steps