Skip to main content
Version: 1.x

Introduction


Core Concepts​

Event​

Field Name​

A valid field name should not contains ., @ and any other characters that not allowed in ANSI standard SQL 2003 syntax.

Reserved field names includes:

  • __root__ means top level of the event.
  • __metadata__ means metadata field for internal use.

Metadata​

Metadata can be set as usual fields, all the fields in metadata are invisible for output, it's just for internal use.

Field Reference​

Single level: a Multiple level: a.b.c Top leve (Root) Reference: __root__

[TODO] Notes: this design should be compatible with Spark SQL.


Input​

Kafka​


Filters​

JSON​

Split​

Synopsis​

SettingInput typeRequiredDefault value
delimiterstringno" "
keysarrayyes[]
source_fieldstringyes""
tag_on_failurestringno"_tag"
target_fieldstringno"__root__"

Details​

  • delimiter

regular expression is supported.

  • keys

if number of parts splited by delimiter is larger than number of keys in keys, the extra parts in the right side will be ignored.

  • source_field

if source_field does not exists, nothing will be done.

  • target_field

SQL​

SQL can be used to filter and aggregate events, the underlying techniques is Spark SQL.

For example, the following sql filters events that response_time between [300, 1200] milliseconds.

select * from mytable where response_time >= 300 and response_time <= 1200

And this sql count sales for each city:

select city, count(sales) from mytable group by city

Also, You can combine these two sqls into one sql for both filtering and aggregation:

select city, count(*) from mytable where response_time >= 300 and response_time <= 1200 group by city

Pipeline multiple sqls:

sql {
query {
table_name = "mytable1"
sql = "select * from mytable1 where "
}

query {
table_name = ""
}
}

Query​

Synopsis​

SettingInput typeRequiredDefault value
table_namestringno"mytable"
sqlstringyes-

TODO : maybe we can add a schema settings for explicitly defining table schema. By now, schema is auto generated.

Details​

  • table_name

Registers a temporary table using the given name, the default value is "mytable". You can use it in sql, such as:

select * from mytable where http_status >= 500
  • sql

Executes a SQL query using the given sql string.


Output​

Kafka​

Serializer​

Raw​

The default serializer is raw. If no serializers configured in input/output, raw will be used.

Synopsis​

SettingInput typeRequiredDefault value
charsetstringno"utf-8"

Details​

  • charset

Serialize or deserialize using the given charset.

Available charsets are:

[TODO] list all supported charsets, refer to logstash and these links:

https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html http://www.iana.org/assignments/character-sets/character-sets.xhtml

JSON​

Tar.gz​

compressed codec

Contact Us​

  • Mail list: dev@seatunnel.apache.org. Mail to dev-subscribe@seatunnel.apache.org, follow the reply to subscribe the mail list.
  • Slack: Send Request to join SeaTunnel slack mail to the mail list(dev@seatunnel.apache.org), we will invite you in.