跳到主要内容
版本:Next

Http

Http source connector

Support Those Engines

Spark
Flink
SeaTunnel Zeta

Key Features

Description

Used to read data from Http.

Key features

Supported DataSource Info

In order to use the Http connector, the following dependencies are required. They can be downloaded via install-plugin.sh or from the Maven central repository.

DatasourceSupported VersionsDependency
HttpuniversalDownload

Source Options

NameTypeRequiredDefaultDescription
urlStringYes-Http request url.
schemaConfigNo-Http and seatunnel data structure mapping
schema.fieldsConfigNo-The schema fields of upstream data
json_fieldConfigNo-This parameter helps you configure the schema,so this parameter must be used with schema.
pageingConfigNo-This parameter is used for paging queries
pageing.page_fieldStringNo-This parameter is used to specify the page field name in the request parameter
pageing.total_page_sizeIntNo-This parameter is used to control the total number of pages
pageing.batch_sizeIntNo-The batch size returned per request is used to determine whether to continue when the total number of pages is unknown
pageing.start_page_numberIntNo1Specify the page number from which synchronization starts
content_jsonStringNo-This parameter can get some json data.If you only need the data in the 'book' section, configure content_field = "$.store.book.*".
formatStringNotextThe format of upstream data, now only support json text, default text.
methodStringNogetHttp request method, only supports GET, POST method.
headersMapNo-Http headers.
paramsMapNo-Http params.
bodyStringNo-Http body,the program will automatically add http header application/json,body is jsonbody.
poll_interval_millisIntNo-Request http api interval(millis) in stream mode.
retryIntNo-The max retry times if request http return to IOException.
retry_backoff_multiplier_msIntNo100The retry-backoff times(millis) multiplier if request http failed.
retry_backoff_max_msIntNo10000The maximum retry-backoff times(millis) if request http failed
enable_multi_linesBooleanNofalse
connect_timeout_msIntNo12000Connection timeout setting, default 12s.
socket_timeout_msIntNo60000Socket timeout setting, default 60s.
common-optionsNo-Source plugin common parameters, please refer to Source Common Options for details
keep_params_as_formBooleanNofalseWhether the params are submitted according to the form, used for compatibility with legacy behaviors. When true, the value of the params parameter is submitted through the form.
keep_page_param_as_http_paramBooleanNofalseWhether to set the paging parameters to params. For compatibility with legacy behaviors.

How to Create a Http Data Synchronization Jobs

env {
parallelism = 1
job.mode = "BATCH"
}

source {
Http {
plugin_output = "http"
url = "http://mockserver:1080/example/http"
method = "GET"
format = "json"
schema = {
fields {
c_map = "map<string, string>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_bytes = bytes
c_date = date
c_decimal = "decimal(38, 18)"
c_timestamp = timestamp
c_row = {
C_MAP = "map<string, string>"
C_ARRAY = "array<int>"
C_STRING = string
C_BOOLEAN = boolean
C_TINYINT = tinyint
C_SMALLINT = smallint
C_INT = int
C_BIGINT = bigint
C_FLOAT = float
C_DOUBLE = double
C_BYTES = bytes
C_DATE = date
C_DECIMAL = "decimal(38, 18)"
C_TIMESTAMP = timestamp
}
}
}
}
}

# Console printing of the read Http data
sink {
Console {
parallelism = 1
}
}

Parameter Interpretation

format

when you assign format is json, you should also assign schema option, for example:

upstream data is the following:

{
"code": 200,
"data": "get success",
"success": true
}

you should assign schema as the following:


schema {
fields {
code = int
data = string
success = boolean
}
}

connector will generate data as the following:

codedatasuccess
200get successtrue

when you assign format is text, connector will do nothing for upstream data, for example:

upstream data is the following:

{
"code": 200,
"data": "get success",
"success": true
}

connector will generate data as the following:

content
{"code": 200, "data": "get success", "success": true}

keep_params_as_form

For compatibility with old versions of http. When set to true,<params> and <pageing> will be submitted in the form. When set to false,<params> will be added to the url path,and <pageing> will not be added to the body or form. It will replace placeholders in params and body.

keep_page_param_as_http_param

Whether to set the paging parameters to params. When set to true,<pageing> is set to <params>. When set to false,When the page field exists in <body> or <params>, replace value.

When set to false,config example:

body="""{"id":1,"page":"${page}"}"""
params={
page: "${page}"
}

params

By default, the parameters will be added to the url path. If you need to keep the old version behavior, please check keep_params_as_form.

body

The HTTP body is used to carry the actual data in requests or responses, including JSON, form submissions.

The reference format is as follows:

body="{"id":1,"name":"setunnel"}"

For form submissions,please set the content-type as follows.

headers {
Content-Type = "application/x-www-form-urlencoded"
}

content_json

This parameter can get some json data.If you only need the data in the 'book' section, configure content_field = "$.store.book.*".

If your return data looks something like this.

{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

You can configure content_field = "$.store.book.*" and the result returned looks like this:

[
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
]

Then you can get the desired result with a simpler schema,like

Http {
url = "http://mockserver:1080/contentjson/mock"
method = "GET"
format = "json"
content_field = "$.store.book.*"
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}

Here is an example:

json_field

This parameter helps you configure the schema,so this parameter must be used with schema.

If your data looks something like this:

{ 
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

You can get the contents of 'book' by configuring the task as follows:

source {
Http {
url = "http://mockserver:1080/jsonpath/mock"
method = "GET"
format = "json"
json_field = {
category = "$.store.book[*].category"
author = "$.store.book[*].author"
title = "$.store.book[*].title"
price = "$.store.book[*].price"
}
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}
}

pageing

When you need to concatenate page param in the URL,then add params.

When you need to set page param to the body,add the key of page param in body.

source {
Http {
url = "http://localhost:8080/mock/queryData"
method = "POST"
format = "json"
body="""{"id":1,"page":"${page}"}"""
content_field = "$.data.*"
params={
page: "${page}"
}
pageing={
total_page_size=20
page_field=page
#when don't know the total_page_size use batch_size if read size<batch_size finish ,otherwise continue
#batch_size=10
}
schema = {
fields {
name = string
age = string
}
}
}
}


Changelog

Change Log
ChangeCommitVersion
[Fix][connector-http] fix when post have param (#8434)https://github.com/apache/seatunnel/commit/c1b2675ab0dev
[Improve] restruct connector common options (#8634)https://github.com/apache/seatunnel/commit/f3499a6eebdev
[Improve][dist]add shade check rule (#8136)https://github.com/apache/seatunnel/commit/51ef8000162.3.9
[Feature][Connector-V2] Add prometheus source and sink (#7265)https://github.com/apache/seatunnel/commit/dde6f9fcbd2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)https://github.com/apache/seatunnel/commit/6b7c53d03c2.3.9
[Fix][Connector-V2] Fix http source can not read streaming (#7703)https://github.com/apache/seatunnel/commit/a0ffa7ba022.3.8
[Feature][Connector-V2] Suport choose the start page in http paging (#7180)https://github.com/apache/seatunnel/commit/ed15f0dcf92.3.8
[Improve][Connector] Add multi-table sink option check (#7360)https://github.com/apache/seatunnel/commit/2489f6446b2.3.7
[Improve][API] Make sure the table name in TablePath not be null (#7252)https://github.com/apache/seatunnel/commit/764d8b0bc82.3.7
[Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131)https://github.com/apache/seatunnel/commit/c4ca74122c2.3.6
[Feature][Kafka] Support multi-table source read (#5992)https://github.com/apache/seatunnel/commit/60104602d12.3.6
[Improve][CDC] Close idle subtasks gorup(reader/writer) in increment phase (#6526)https://github.com/apache/seatunnel/commit/454c339b9c2.3.6
Fix HttpSource bug (#6824)https://github.com/apache/seatunnel/commit/c3ab84caa42.3.6
[Hotfix] fix http source can not read yyyy-MM-dd HH:mm:ss format bug & Improve DateTime Utils (#6601)https://github.com/apache/seatunnel/commit/19888e79692.3.5
[Improve][Connector-V2]Support multi-table sink feature for httpsink (#6316)https://github.com/apache/seatunnel/commit/e6c51a95c72.3.5
[Improve][HttpConnector]Increase custom configuration timeout. (#6223)https://github.com/apache/seatunnel/commit/fa5b7d3d832.3.4
[Feature][Core] Upgrade flink source translation (#5100)https://github.com/apache/seatunnel/commit/5aabb14a942.3.4
[BUG][Connector-V2][Http] fix bug http config no schema option and improve e2e test add case (#5939)https://github.com/apache/seatunnel/commit/8a71b9e0722.3.4
[Feature][Connector-V2] Support TableSourceFactory/TableSinkFactory on redis (#5901)https://github.com/apache/seatunnel/commit/e84dcb8c102.3.4
[Feature][Connector-V2] Support TableSourceFactory/TableSinkFactory on http (#5816)https://github.com/apache/seatunnel/commit/6f49ec6ead2.3.4
[Improve][Common] Introduce new error define rule (#5793)https://github.com/apache/seatunnel/commit/9d1b2582b22.3.4
[Feature][Transform] add JsonPath transform (#5632)https://github.com/apache/seatunnel/commit/d908f0af402.3.4
[Improve] Remove use SeaTunnelSink::getConsumedType method and mark it as deprecated (#5755)https://github.com/apache/seatunnel/commit/8de74081002.3.4
[Feature][Connector-V2] HTTP supports page increase #5477 (#5561)https://github.com/apache/seatunnel/commit/bb180b29882.3.4
[improve][Connector-V2][http] improve http e2e test (#5655)https://github.com/apache/seatunnel/commit/f5867adcaa2.3.4
Support config column/primaryKey/constraintKey in schema (#5564)https://github.com/apache/seatunnel/commit/eac76b4e502.3.4
[BUG][Connector-V2][http] fix httpheader cover (#5446)https://github.com/apache/seatunnel/commit/cdd8e0a65e2.3.4
[Feature][Connector][Http] Support multi-line text splits (#4698)https://github.com/apache/seatunnel/commit/6a524981cb2.3.2
Merge branch 'dev' into merge/cdchttps://github.com/apache/seatunnel/commit/4324ee19122.3.1
[Improve][Project] Code format with spotless plugin.https://github.com/apache/seatunnel/commit/423b5830382.3.1
[Feature][Connector-V2][Github] Adding Github Source Connector (#4155)https://github.com/apache/seatunnel/commit/49d9172b102.3.1
[improve][api] Refactoring schema parse (#4157)https://github.com/apache/seatunnel/commit/b2f573a13e2.3.1
[Improve][build] Give the maven module a human readable name (#4114)https://github.com/apache/seatunnel/commit/d7cd6010512.3.1
[Improve][Project] Code format with spotless plugin. (#4101)https://github.com/apache/seatunnel/commit/a2ab1665612.3.1
[Feature][Connector-V2][Persistiq]Add Persistiq source connector (#3460)https://github.com/apache/seatunnel/commit/aec3912edf2.3.1
[Feature][Connector] add get source method to all source connector (#3846)https://github.com/apache/seatunnel/commit/417178fb842.3.1
[Feature][Connector-V2][Notion] Add Notion source connector (#3470)https://github.com/apache/seatunnel/commit/46abc6d9432.3.0
[Hotfix][seatunnel-connectors-v2] [connector-http] fix http json request error (#3629)https://github.com/apache/seatunnel/commit/54f594d6ca2.3.0
[Improve][Connector-V2][Http]Improve json parse option rule for all http connector (#3627)https://github.com/apache/seatunnel/commit/589e4161ec2.3.0
[Improve][Connector-V2][OneSignal]Unified exception for OneSignal connector (#3609)https://github.com/apache/seatunnel/commit/97cce8c2552.3.0
[Feature][Connector-V2][HTTP] Use json-path parsing (#3510)https://github.com/apache/seatunnel/commit/1807eb6c952.3.0
[Improve][Connector-V2][Http]Unified exception for http source & sink… (#3594)https://github.com/apache/seatunnel/commit/d798cd86702.3.0
[Hotfix][OptionRule] Fix option rule about all connectors (#3592)https://github.com/apache/seatunnel/commit/226dc6a1192.3.0
[Improve][Connector-V2][MyHours]Unified exception for MyHours connector (#3538)https://github.com/apache/seatunnel/commit/48ab7c97d52.3.0
[Improve][Connector-V2][Gitlab] Unified excetion for Gitlab connector and improve optione rule (#3533)https://github.com/apache/seatunnel/commit/77f68f1eef2.3.0
[Improve][Connector-V2][Klaviyo]Unified exception for Klaviyo connector (#3555)https://github.com/apache/seatunnel/commit/08f86150782.3.0
[Feature][Connector-V2][Jira]Add Jira source connector (#3473)https://github.com/apache/seatunnel/commit/fb40162c072.3.0
[Improve][Connector-V2][Lemlist] Unified exception for lemlist connector (#3534)https://github.com/apache/seatunnel/commit/705728ebbb2.3.0
[Feature][Connector V2] add gitlab source connector (#3408)https://github.com/apache/seatunnel/commit/545595c6d22.3.0
[Feature][Connector-V2][OneSignal]Add OneSignal source conector (#3454)https://github.com/apache/seatunnel/commit/b318b3166f2.3.0
[Feature][Connector-V2][Klaviyo]Add Klaviyo source connector (#3443)https://github.com/apache/seatunnel/commit/fc00a2866b2.3.0
[Feature][Connector-V2][Lemlist]Add Lemlist source connector (#3346)https://github.com/apache/seatunnel/commit/12d66b42472.3.0
[HotFix][Core][API] Fix OptionValidation error code (#3439)https://github.com/apache/seatunnel/commit/ace219f3762.3.0
[Improve][Connector-V2][My Hours]Add http method enum && Improve My Hours connector option rule (#3390)https://github.com/apache/seatunnel/commit/a86c9d90f72.3.0
[Feature][Connector-V2][Http] Add option rules && Improve Myhours sink connector (#3351)https://github.com/apache/seatunnel/commit/cc8bb60c832.3.0
[Feature][Connector-V2][My Hours] Add My Hours Source Connector (#3228)https://github.com/apache/seatunnel/commit/4104a3e30e2.3.0
[Improve][all] change Log to @Slf4j (#3001)https://github.com/apache/seatunnel/commit/6016100f122.3.0-beta
[Bug][format][json] Fix jackson package conflict with spark (#2934)https://github.com/apache/seatunnel/commit/1a92b8369b2.3.0-beta
[Bug][Connector-V2] Fix wechat sink data serialization (#2856)https://github.com/apache/seatunnel/commit/3aee11fc162.3.0-beta
[Improve][Connector-V2] Improve http connector (#2833)https://github.com/apache/seatunnel/commit/5b3957bc522.2.0-beta
[DEV][Api] Replace SeaTunnelContext with JobContext and remove singleton pattern (#2706)https://github.com/apache/seatunnel/commit/cbf82f755c2.2.0-beta
[Improve][build] Improved scope of maven-shade-plugin (#2665)https://github.com/apache/seatunnel/commit/93bc8bd1162.2.0-beta
[#2606]Dependency management split (#2630)https://github.com/apache/seatunnel/commit/fc047be69b2.2.0-beta
[chore][connector-common] Rename SeatunnelSchema to SeaTunnelSchema (#2538)https://github.com/apache/seatunnel/commit/7dc2a273882.2.0-beta
[Bug][Connector-V2] Fix the bug that set params by mistake (#2511) (#2513)https://github.com/apache/seatunnel/commit/ead3d68b0e2.2.0-beta
[Improve][Connector-V2] Http source support user-defined schema (#2439)https://github.com/apache/seatunnel/commit/793933b6b82.2.0-beta
[Feature][Connector-V2] Add Enterprise Wechat sink connector (#2412)https://github.com/apache/seatunnel/commit/3e200e0a382.2.0-beta
[Improve][Connector-V2] Format SeaTunnelRow use seatunnel-format-json (#2435)https://github.com/apache/seatunnel/commit/e4e8f7fbff2.2.0-beta
[Improve][Connector-V2] Make the attribute of http-connector from private to protected (#2418)https://github.com/apache/seatunnel/commit/f3b00ef6962.2.0-beta
[Feature][Connector-V2] Add feishu sink (#2381)https://github.com/apache/seatunnel/commit/0fec8ca4382.2.0-beta
[Feature][Connector-V2] Add http sink(Webhook) (#2348)https://github.com/apache/seatunnel/commit/4b7207490a2.2.0-beta
[Improve][Http Connector-V2-Source] Refactor the code and make code more clearly (#2322)https://github.com/apache/seatunnel/commit/a9a797ad852.2.0-beta
[Improve][Connector-V2] Fix the log information (#2317)https://github.com/apache/seatunnel/commit/736983a7082.2.0-beta
[Improve][Connector-V2] Http client provider improve (#2312)https://github.com/apache/seatunnel/commit/cc950007c82.2.0-beta
[Improve][Connector-V2] Fix 'Singleton' word error (#2309)https://github.com/apache/seatunnel/commit/12ebcb4a0d2.2.0-beta
[api-draft][Optimize] Optimize module name (#2062)https://github.com/apache/seatunnel/commit/f79e3112b12.2.0-beta