Version: Next

SeaTunnel With Spark

Apache Spark is the right choice when your team already runs Spark and wants SeaTunnel jobs to fit into that batch or mixed workload environment. If you are evaluating SeaTunnel from scratch and do not need Spark specifically, start with SeaTunnel Engine first.

Start Here

Use this path if you want to run SeaTunnel on Spark:

When To Choose Spark

Spark is usually the right engine when:

your organization already runs Spark clusters in production
the surrounding workloads are mainly batch-oriented
you want SeaTunnel to align with an existing Spark ecosystem and deployment model

Spark-Specific Configuration

Spark-specific job options live in the env block and use the spark. prefix.

Example:

env {
  spark.app.name = "example"
  spark.sql.catalogImplementation = "hive"
  spark.executor.memory = "2g"
  spark.executor.instances = "2"
  spark.yarn.priority = "100"
  spark.dynamicAllocation.enabled = "false"
}

Command Line Example

Spark on YARN cluster mode:

./bin/start-seatunnel-spark-3-connector-v2.sh --master yarn --deploy-mode cluster --config config/example.conf

Spark on YARN client mode:

./bin/start-seatunnel-spark-3-connector-v2.sh --master yarn --deploy-mode client --config config/example.conf

Minimal Example Job

The example below runs on Spark and prints generated records to the console.

env {
  parallelism = 1

  spark.app.name = "example"
  spark.sql.catalogImplementation = "hive"
  spark.executor.memory = "2g"
  spark.executor.instances = "1"
  spark.yarn.priority = "100"
  spark.dynamicAllocation.enabled = "false"
}

source {
  FakeSource {
    plugin_output = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
  FieldMapper {
    plugin_input = "fake"
    plugin_output = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    plugin_input = "fake1"
  }
}

If you need more transform options, see Transforms Catalog and Transform Common Options.

Running From A Source Checkout

If you are running examples from the repository source tree, the example module is:

seatunnel-examples/seatunnel-spark-connector-v2-example

The example entry point is:

org.apache.seatunnel.example.spark.v2.SeaTunnelApiExample

Next Steps

Quick Start With Spark
Spark Translation Layer
Transforms Catalog
SeaTunnel Engine if you want to compare against the default engine

SeaTunnel With Spark

Start Here​

When To Choose Spark​

Spark-Specific Configuration​

Command Line Example​

Minimal Example Job​

Running From A Source Checkout​

Next Steps​