Introduction
Core Conceptsâ
Eventâ
Field Nameâ
A valid field name should not contains .
, @
and any other characters that not allowed in ANSI standard SQL 2003 syntax.
Reserved field names includes:
__root__
means top level of the event.__metadata__
means metadata field for internal use.
Metadataâ
Metadata can be set as usual fields, all the fields in metadata are invisible for output, it's just for internal use.
Field Referenceâ
Single level: a
Multiple level: a.b.c
Top leve (Root) Reference: __root__
[TODO] Notes: this design should be compatible with Spark SQL.
Inputâ
Kafkaâ
Filtersâ
JSONâ
Splitâ
Synopsisâ
Setting | Input type | Required | Default value |
---|---|---|---|
delimiter | string | no | " " |
keys | array | yes | [] |
source_field | string | yes | "" |
tag_on_failure | string | no | "_tag" |
target_field | string | no | "__root__" |
Detailsâ
- delimiter
regular expression is supported.
- keys
if number of parts splited by delimiter
is larger than number of keys in keys
, the extra parts in the right side will be ignored.
- source_field
if source_field
does not exists, nothing will be done.
- target_field
SQLâ
SQL can be used to filter and aggregate events, the underlying techniques is Spark SQL.
For example, the following sql filters events that response_time between [300, 1200] milliseconds.
select * from mytable where response_time >= 300 and response_time <= 1200
And this sql count sales for each city:
select city, count(sales) from mytable group by city
Also, You can combine these two sqls into one sql for both filtering and aggregation:
select city, count(*) from mytable where response_time >= 300 and response_time <= 1200 group by city
Pipeline multiple sqls:
sql {
query {
table_name = "mytable1"
sql = "select * from mytable1 where "
}
query {
table_name = ""
}
}
Queryâ
Synopsisâ
Setting | Input type | Required | Default value |
---|---|---|---|
table_name | string | no | "mytable" |
sql | string | yes | - |
TODO : maybe we can add a
schema
settings for explicitly defining table schema. By now, schema is auto generated.
Detailsâ
- table_name
Registers a temporary table using the given name, the default value is "mytable". You can use it in sql
, such as:
select * from mytable where http_status >= 500
- sql
Executes a SQL query using the given sql string.
Outputâ
Kafkaâ
Serializerâ
Rawâ
The default serializer is raw
. If no serializers configured in input/output, raw
will be used.
Synopsisâ
Setting | Input type | Required | Default value |
---|---|---|---|
charset | string | no | "utf-8" |
Detailsâ
- charset
Serialize or deserialize using the given charset.
Available charsets are:
[TODO] list all supported charsets, refer to logstash and these links:
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html http://www.iana.org/assignments/character-sets/character-sets.xhtml
JSONâ
Tar.gzâ
compressed codec
Contact Usâ
- Mail list: dev@seatunnel.apache.org. Mail to
dev-subscribe@seatunnel.apache.org
, follow the reply to subscribe the mail list. - Slack: Send
Request to join SeaTunnel slack
mail to the mail list(dev@seatunnel.apache.org
), we will invite you in.