JDBC to S3
Use this recipe when you want to export table data from a relational database to an S3-compatible object store.
Prerequisites
Finish Run your first job.
Install the plugins required by this recipe. Follow Deployment > Download The Connector Plugins, then keep only the plugins below in
config/plugin_config:
--seatunnel-connectors--
connector-jdbc
connector-file-s3
--end--
cd "${SEATUNNEL_HOME}"
sh bin/install-plugin.sh
ls connectors | rg 'connector-(jdbc|file-s3)'
- Put the source database JDBC driver into
${SEATUNNEL_HOME}/libfor SeaTunnel Zeta, then confirm the jar is visible:
ls "${SEATUNNEL_HOME}/lib" | rg 'mysql-connector'
- Put
hadoop-awsand the AWS SDK bundle required by the S3 connector into${SEATUNNEL_HOME}/lib, then confirm both jars are present:
ls "${SEATUNNEL_HOME}/lib" | rg 'hadoop-aws|aws-java-sdk-bundle'
- Prepare the source table in the relational database. This example uses MySQL and exports two rows from
analytics.orders:
CREATE DATABASE IF NOT EXISTS analytics;
CREATE TABLE IF NOT EXISTS analytics.orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT,
total_amount DECIMAL(16, 2),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO analytics.orders (id, customer_id, total_amount, updated_at) VALUES
(5001, 101, 19.99, NOW()),
(5002, 102, 29.99, NOW());
- Prepare an S3 bucket and credentials with write permission. The example below assumes the bucket already exists and that
access_keyandsecret_keycan write tos3://company-data-lake/seatunnel/orders/.
Minimal configuration
This example exports a query result from MySQL to S3 in JSON lines format so that the output is easy to inspect.
env {
parallelism = 1
job.mode = "BATCH"
}
source {
Jdbc {
plugin_output = "orders_jdbc"
url = "jdbc:mysql://mysql:3306/analytics"
driver = "com.mysql.cj.jdbc.Driver"
username = "root"
password = "password"
query = "select id, customer_id, total_amount, updated_at from orders"
}
}
sink {
S3File {
plugin_input = "orders_jdbc"
bucket = "s3a://company-data-lake"
path = "/seatunnel/orders/"
fs.s3a.endpoint = "s3.us-east-1.amazonaws.com"
fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
access_key = "your-access-key"
secret_key = "your-secret-key"
file_format_type = "json"
row_delimiter = "\n"
custom_filename = true
file_name_expression = "orders"
filename_extension = "json"
single_file_mode = true
is_enable_transaction = false
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
data_save_mode = "APPEND_DATA"
}
}
Run the job
Save the config as config/jdbc-to-s3.conf, then run SeaTunnel in local mode:
cd "${SEATUNNEL_HOME}"
./bin/seatunnel.sh --config ./config/jdbc-to-s3.conf -m local
Validation result
- Run the source query directly in the database and record the expected row count.
- Start the SeaTunnel job.
- Verify that new objects are written to the expected S3 prefix.
aws s3 ls s3://company-data-lake/seatunnel/orders/ --recursive
aws s3 cp s3://company-data-lake/seatunnel/orders/orders.json - | head
If objects are created under the target prefix and the exported content matches the source query, the pipeline is working.
Common pitfalls
- The JDBC driver is available on your workstation but not under
${SEATUNNEL_HOME}/lib. bucketandpathare mixed up. Keep the bucket inbucketand the prefix inpath.- The credential provider does not match the authentication method you configured.
- Large tables are exported through one unbounded query without filtering or partitioning.
- Fixed filenames are only safe for this single-file tutorial. If you enable transactions again, keep
${transactionId}infile_name_expression. - The target endpoint is S3-compatible, but the
fs.s3a.endpointvalue still points to AWS.