Skip to main content
Version: Next

How it works

What New Users Should Know First

You do not need to understand every internal module before running SeaTunnel. For most first-time users, the practical order is:

  1. run one job locally
  2. learn the config structure
  3. choose the right connectors and engine
  4. come back here when you want to understand the runtime model better

SeaTunnel is easiest to understand as a config-driven pipeline that runs on a chosen execution engine.

Overview

SeaTunnel is a distributed multimodal data integration tool with a pluggable architecture. It decouples the connector layer from the execution engine, allowing the same connectors to run on different engines.

This page is the shortest bridge between first-run docs and deeper architecture docs. Read it when you already know SeaTunnel at a high level but still need a practical mental model of how job config, plugins, and engines connect.

The Four Building Blocks

1. Job Configuration

Your config file describes what to read, how to transform it, where to write it, and which engine settings should be used.

2. SeaTunnel Core

SeaTunnel parses the config, builds an execution plan, loads plugins, and coordinates submission to the selected engine.

3. Source -> Transform -> Sink

This is the data path most users should remember first:

  • Source reads from external systems
  • Transform optionally reshapes or filters the data
  • Sink writes the result to the target system

4. Execution Engine

The engine decides where the job runs. Most new users should start with SeaTunnel Engine (Zeta), then move to Flink or Spark only when their environment already depends on those platforms.

If you are building your first system-level understanding, read in this order:

Core Components

1. Connector API

Engine-independent API for developing Source, Transform, and Sink connectors.

ComponentDescription
SourceReads data from external systems (databases, files, message queues)
TransformPerforms data transformations (field mapping, filtering, type conversion)
SinkWrites data to target systems

2. Execution Engines

EngineBest For
SeaTunnel Engine (Zeta)Data synchronization, CDC, low resource usage
Apache FlinkComplex stream processing, existing Flink infrastructure
Apache SparkLarge-scale batch processing, existing Spark infrastructure

3. Translation Layer

Translates SeaTunnel's unified API to engine-specific implementations, enabling connector reuse across engines.

Data Flow

Key Features:

  • Parallel reading with split-based distribution
  • Exactly-once semantics via distributed snapshots
  • Automatic failover and recovery

Module Structure

ModuleResponsibility
seatunnel-apiCore API definitions
seatunnel-connectors-v2Source and sink connectors
seatunnel-transforms-v2Transform plugins
seatunnel-engineSeaTunnel Engine (Zeta)
seatunnel-translationEngine adapters for Flink and Spark
seatunnel-coreJob submission and CLI
seatunnel-formatsData format handlers
seatunnel-e2eEnd-to-end tests

Job Execution Flow

  1. Parse - Read and validate job configuration
  2. Plan - Generate execution plan with parallelism
  3. Schedule - Distribute tasks to workers
  4. Execute - Run Source → Transform → Sink pipeline
  5. Monitor - Track progress, metrics, and checkpoints

Next Steps