Version: Next

Intro To Config File

If you are writing your first real SeaTunnel job, this page is the fastest way to understand the four blocks that appear in almost every config: env, source, transform, and sink.

SeaTunnel supports hocon, json, and SQL config formats. HOCON is the most common format in quick starts and production examples. For SQL format, see SQL configuration.

If you want the shortest first-run path before reading this page, start with Getting Started Overview and Quick Start With SeaTunnel Engine.

Example

Before you read on, you can find example configs here and in the binary package's config directory.

Config File Structure

The config file is similar to the below one:

warn

The old configuration name source_table_name/result_table_name is deprecated, please migrate to the new name plugin_input/plugin_output as soon as possible.

hocon

env {
  job.mode = "BATCH"
}

source {
  FakeSource {
    plugin_output = "fake"
    row.num = 100
    schema = {
      fields {
        name = "string"
        age = "int"
        card = "int"
      }
    }
  }
}

transform {
  Filter {
    plugin_input = "fake"
    plugin_output = "fake1"
    fields = [name, card]
  }
}

sink {
  Clickhouse {
    host = "clickhouse:8123"
    database = "default"
    table = "seatunnel_console"
    fields = ["name", "card"]
    username = "default"
    password = ""
    plugin_input = "fake1"
  }
}

Most SeaTunnel jobs follow this structure: env, source, transform, and sink. Once you understand these four sections, it becomes much easier to read quick starts and connector examples.

env

Use env for job-level and engine-level settings such as job.mode, parallelism, checkpoint options, and engine-specific parameters.

Common parameters are shared across engines. Engine-specific parameters are separated by prefix. For Flink and Spark, see JobEnvConfig.

The most common env parameters are:

job.mode: chooses BATCH or STREAMING
job.name: sets the job name shown by the engine and UI
parallelism: controls how many parallel readers and writers SeaTunnel uses
checkpoint.interval: enables periodic checkpoints in streaming jobs
checkpoint.timeout: limits how long a checkpoint can run before the job fails
jars: loads extra third-party JARs needed by the job
shade.identifier: selects the config encryption or decryption strategy

For the full parameter list, engine-specific prefixes, and more examples, see JobEnvConfig.

source

source defines where SeaTunnel reads data from. You can declare multiple sources in one job. Each connector has its own parameters, plus common wiring fields such as plugin_output, which names the dataset produced by that source.

See the full list in Source Connectors.

transform

transform is optional. Use it when you need field mapping, filtering, type conversion, SQL processing, or other intermediate shaping between source and sink. If you do not need that layer, a job can go directly from source to sink, like this:

env {
  job.mode = "BATCH"
}

source {
  FakeSource {
    plugin_output = "fake"
    row.num = 100
    schema = {
      fields {
        name = "string"
        age = "int"
        card = "int"
      }
    }
  }
}

sink {
  Clickhouse {
    host = "clickhouse:8123"
    database = "default"
    table = "seatunnel_console"
    fields = ["name", "age", "card"]
    username = "default"
    password = ""
    plugin_input = "fake"
  }
}

Like source connectors, each transform has its own parameters. See Transforms.

sink

sink defines where the processed data is written. Sink connectors are similar to source connectors, but they focus on write behavior, destination schema, commit mode, and delivery guarantees.

See Supported Sinks.

How `plugin_output` And `plugin_input` Work

When a job contains multiple sources, transforms, or sinks, SeaTunnel needs a way to describe which dataset flows into which next step. That wiring is done by plugin_output and plugin_input.

plugin_output names the dataset produced by the current source or transform
plugin_input tells a transform or sink which upstream dataset to consume

In simple one-source jobs, you can often omit them because SeaTunnel uses a default convention and passes the previous module's output forward automatically.

Multi-line Support

In hocon, multiline strings are supported, which allows you to include extended passages of text without worrying about newline characters or special formatting. This is achieved by enclosing the text within triple quotes """ . For example:

var = """
Apache SeaTunnel is a
next-generation high-performance,
distributed, massive data integration tool.
"""
sql = """ select * from "table" """

Json Format Support

Before writing the config file, please make sure that the name of the config file should end with .json.

{
  "env": {
    "job.mode": "batch"
  },
  "source": [
    {
      "plugin_name": "FakeSource",
      "plugin_output": "fake",
      "row.num": 100,
      "schema": {
        "fields": {
          "name": "string",
          "age": "int",
          "card": "int"
        }
      }
    }
  ],
  "transform": [
    {
      "plugin_name": "Filter",
      "plugin_input": "fake",
      "plugin_output": "fake1",
      "fields": ["name", "card"]
    }
  ],
  "sink": [
    {
      "plugin_name": "Clickhouse",
      "host": "clickhouse:8123",
      "database": "default",
      "table": "seatunnel_console",
      "fields": ["name", "card"],
      "username": "default",
      "password": "",
      "plugin_input": "fake1"
    }
  ]
}

Config Variable Substitution

In a config file, we can define variables and replace them at runtime. However, note that only HOCON format files are supported.

Usage of Variables:

${varName}: If the variable is not provided, an exception will be thrown.
${varName:default}: If the variable is not provided, the default value will be used. If you set a default value, it should be enclosed in double quotes.
${varName:}: If the variable is not provided, an empty string will be used.

If you do not set the variable value through -i, you can also pass the value by setting the system environment variables. Variable substitution supports obtaining variable values through environment variables. For example, you can set the environment variable in the shell script as follows:

export varName="value with space"

Then you can use the variable in the config file.

If you set a variable without a default value in the configuration file but do not pass it during execution, the value of the variable will be retained and the system will not throw an exception. But please ensure that other processes can correctly parse the variable value. For example, ElasticSearch's index needs to support a format like '${xxx}' to dynamically specify the index. If other processes are not supported, the program may not run properly.

Example:

env {
  job.mode = "BATCH"
  job.name = ${jobName}
  parallelism = 2
}

source {
  FakeSource {
    plugin_output = "${resName:fake_test}_table"
    row.num = "${rowNum:50}"
    string.template = ${strTemplate}
    int.template = [20, 21]
    schema = {
      fields {
        name = "${nameType:string}"
        age = ${ageType}
      }
    }
  }
}

transform {
    sql {
      plugin_input = "${resName:fake_test}_table"
      plugin_output = "sql"
      query = "select * from ${resName:fake_test}_table where name = '${nameVal}' "
    }

}

sink {
  Console {
     plugin_input = "sql"
     username = ${username}
     password = ${password}
  }
}

In the configuration above, we have defined several variables like ${rowNum}, ${resName}. We can replace these parameters using the following shell command:

./bin/seatunnel.sh -c <this_config_file> 
-i jobName='this_is_a_job_name' 
-i strTemplate=['abc','d~f','hi'] 
-i ageType=int
-i nameVal=abc 
-i username=seatunnel=2.3.1 
-i password='$a^b%c.d~e0*9(' 
-m local

In this case, resName, rowNum, and nameType are not set, so they will take their default values.

The final submitted configuration would be:

env {
  job.mode = "BATCH"
  job.name = "this_is_a_job_name"
  parallelism = 2
}

source {
  FakeSource {
    plugin_output = "fake_test_table"
    row.num = 50
    string.template = ['abc','d~f','hi']
    int.template = [20, 21]
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
    sql {
      plugin_input = "fake_test_table"
      plugin_output = "sql"
      query = "select * from dual where name = 'abc' "
    }

}

sink {
  Console {
     plugin_input = "sql"
     username = "seatunnel=2.3.1"
     password = "$a^b%c.d~e0*9("
    }
}

Important Notes:

If a value contains special characters like (, enclose it in single quotes (').
If the substitution variable contains double or single quotes (e.g., "resName" or "nameVal"), you need to include them with the value.
The value cannot contain spaces (' '). For example, -i jobName='this is a job name' will be replaced with job.name = "this". You can use environment variables to pass values with spaces.
For dynamic parameters, you can use the following format: -i date=$(date +"%Y%m%d").
Cannot use specified system reserved characters; they will not be replaced by -i, such as: ${database_name}, ${schema_name}, ${table_name}, ${schema_full_name}, ${table_full_name}, ${primary_key}, ${unique_key}, ${field_names}, ${partition_keys}. For details, please refer to Sink Parameter Placeholders.

What's More

Start writing your own config file now, choose the connector you want to use, and configure it according to the connector documentation.
See JobEnvConfig when you need engine-specific settings.
See HOCON if you want the full syntax details.

Intro To Config File

Example​

Config File Structure​

hocon​

env​

source​

transform​

sink​

How plugin_output And plugin_input Work​

Multi-line Support​

Json Format Support​

Config Variable Substitution​

Usage of Variables:​

Example:​

Important Notes:​

What's More​