Github
Github 源连接器
描述
用于从 Github 读取数据。
关键特性
选项
| 名称 | 类型 | 必填 | 默认值 | 
|---|---|---|---|
| url | String | 是 | - | 
| access_token | String | 否 | - | 
| method | String | 否 | get | 
| schema.fields | Config | 否 | - | 
| format | String | 否 | json | 
| params | Map | 否 | - | 
| body | String | 否 | - | 
| json_field | Config | 否 | - | 
| content_json | String | 否 | - | 
| poll_interval_millis | int | 否 | - | 
| retry | int | 否 | - | 
| retry_backoff_multiplier_ms | int | 否 | 100 | 
| retry_backoff_max_ms | int | 否 | 10000 | 
| enable_multi_lines | boolean | 否 | false | 
| common-options | config | 否 | - | 
url [String]
HTTP 请求 URL。
access_token [String]
GitHub个人访问令牌,请参阅:创建个人访问令牌 - Github文档
method [String]
HTTP 请求方法。目前支持 GET 和 POST。
params [Map]
http 参数
body [String]
HTTP 请求体
poll_interval_millis [int]
流模式下请求 API 的间隔时间(毫秒)。
retry [int]
请求失败(IOException)时最大重试次数。
retry_backoff_multiplier_ms [int]
请求失败时的退避时间(毫秒)乘数。
retry_backoff_max_ms [int]
请求失败时的最大退避时间(毫秒)。
format [String]
上游数据的格式,现在仅支持json text,默认是json。
若你的数据格式为 json,需同时配置 schema 选项,例如:
上游数据如下:
{
  "code": 200,
  "data": "get success",
  "success": true
}
您应该配置 schema 为以下内容:
schema {
    fields {
        code = int
        data = string
        success = boolean
    }
}
连接器将生成如下数据:
| code | data | success | 
|---|---|---|
| 200 | get success | true | 
若你设置格式为 text,连接器不会对上游数据做出任何改变,示例:
上游数据如下:
{
  "code": 200,
  "data": "get success",
  "success": true
}
连接器将生成如下数据:
| content | 
|---|
| {"code": 200, "data": "get success", "success": true} | 
schema [Config]
fields [Config]
上游数据的字段定义。
content_json [String]
该参数可用于提取一些 json 数据。如果你只需要 “book” 部分的数据,可以配置 content_field = "$.store.book.*".
如果你的返回数据如下所示:
{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  },
  "expensive": 10
}
你可以配置 content_field = "$.store.book.*" 并且结果返回如下:
[
  {
    "category": "reference",
    "author": "Nigel Rees",
    "title": "Sayings of the Century",
    "price": 8.95
  },
  {
    "category": "fiction",
    "author": "Evelyn Waugh",
    "title": "Sword of Honour",
    "price": 12.99
  }
]
然后你可以通过更简单的 schema 配置获取所需的结果,例如:
Http {
  url = "http://mockserver:1080/contentjson/mock"
  method = "GET"
  format = "json"
  content_field = "$.store.book.*"
  schema = {
    fields {
      category = string
      author = string
      title = string
      price = string
    }
  }
}
这是一个例子:
- 测试数据可参考此链接:mockserver-config.json
- 任务配置示例可参考此链接:http_contentjson_to_assert.conf.
json_field [Config]
该参数用于帮助你配置 schema,因此必须与 schema 一起使用。
如果你的数据如下所示:
{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  },
  "expensive": 10
}
你可以通过如下任务配置获取 “book” 部分的内容:
source {
  Http {
    url = "http://mockserver:1080/jsonpath/mock"
    method = "GET"
    format = "json"
    json_field = {
      category = "$.store.book[*].category"
      author = "$.store.book[*].author"
      title = "$.store.book[*].title"
      price = "$.store.book[*].price"
    }
    schema = {
      fields {
        category = string
        author = string
        title = string
        price = string
      }
    }
  }
}
- 测试数据可参考此链接:mockserver-config.json
- 任务配置示例可参考此链接:http_jsonpath_to_assert.conf.
common options
源插件通用参数,请参考 常用选项获取详细说明。
示例
Github {
  url = "https://api.github.com/orgs/apache/repos"
  access_token = "xxxx"
  method = "GET"
  format = "json"
  schema = {
    fields {
      id = int
      name = string
      description = string
      html_url = string
      stargazers_count = int
      forks = int
    }
  }
}
变更日志
Change Log
| Change | Commit | Version | 
|---|---|---|
| [improve] http connector options (#8969) | https://github.com/apache/seatunnel/commit/63ff9f910a | 2.3.10 | 
| [Feature][Connector-V2] Support TableSourceFactory/TableSinkFactory on http (#5816) | https://github.com/apache/seatunnel/commit/6f49ec6ead | 2.3.4 | 
| [Feature][Connector-V2][Github] Adding Github Source Connector (#4155) | https://github.com/apache/seatunnel/commit/49d9172b10 | 2.3.1 |