FakeSource
FakeSource connector
Support Those Engines
Spark
Flink
SeaTunnel Zeta
Description
The FakeSource is a virtual data source, which randomly generates the number of rows according to the data structure of the user-defined schema, just for some test cases such as type conversion or connector new feature testing
Key Features
Source Options
| Name | Type | Required | Default | Description | 
|---|---|---|---|---|
| tables_configs | list | no | - | Define Multiple FakeSource, each item can contains the whole fake source config description below | 
| schema | config | yes | - | Define Schema information | 
| rows | config | no | - | The row list of fake data output per degree of parallelism see title Options rows Case. | 
| row.num | int | no | 5 | The total number of data generated per degree of parallelism | 
| split.num | int | no | 1 | the number of splits generated by the enumerator for each degree of parallelism | 
| split.read-interval | long | no | 1 | The interval(mills) between two split reads in a reader | 
| map.size | int | no | 5 | The size of maptype that connector generated | 
| array.size | int | no | 5 | The size of arraytype that connector generated | 
| bytes.length | int | no | 5 | The length of bytestype that connector generated | 
| string.length | int | no | 5 | The length of stringtype that connector generated | 
| string.fake.mode | string | no | range | The fake mode of generating string data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configuredstring.templateoption | 
| string.template | list | no | - | The template list of string type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| tinyint.fake.mode | string | no | range | The fake mode of generating tinyint data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configuredtinyint.templateoption | 
| tinyint.min | tinyint | no | 0 | The min value of tinyint data that connector generated | 
| tinyint.max | tinyint | no | 127 | The max value of tinyint data that connector generated | 
| tinyint.template | list | no | - | The template list of tinyint type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| smallint.fake.mode | string | no | range | The fake mode of generating smallint data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configuredsmallint.templateoption | 
| smallint.min | smallint | no | 0 | The min value of smallint data that connector generated | 
| smallint.max | smallint | no | 32767 | The max value of smallint data that connector generated | 
| smallint.template | list | no | - | The template list of smallint type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| int.fake.template | string | no | range | The fake mode of generating int data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configuredint.templateoption | 
| int.min | int | no | 0 | The min value of int data that connector generated | 
| int.max | int | no | 0x7fffffff | The max value of int data that connector generated | 
| int.template | list | no | - | The template list of int type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| bigint.fake.mode | string | no | range | The fake mode of generating bigint data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configuredbigint.templateoption | 
| bigint.min | bigint | no | 0 | The min value of bigint data that connector generated | 
| bigint.max | bigint | no | 0x7fffffffffffffff | The max value of bigint data that connector generated | 
| bigint.template | list | no | - | The template list of bigint type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| float.fake.mode | string | no | range | The fake mode of generating float data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configuredfloat.templateoption | 
| float.min | float | no | 0 | The min value of float data that connector generated | 
| float.max | float | no | 0x1.fffffeP+127 | The max value of float data that connector generated | 
| float.template | list | no | - | The template list of float type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| double.fake.mode | string | no | range | The fake mode of generating float data, support rangeandtemplate, defaultrange,if use configured it totemplate, user should also configureddouble.templateoption | 
| double.min | double | no | 0 | The min value of double data that connector generated | 
| double.max | double | no | 0x1.fffffffffffffP+1023 | The max value of double data that connector generated | 
| double.template | list | no | - | The template list of double type that connector generated, if user configured it, connector will randomly select an item from the template list | 
| common-options | no | - | Source plugin common parameters, please refer to Source Common Options for details | 
Task Example
Simple:
This example Randomly generates data of a specified type. If you want to learn how to declare field types, click here.
schema = {
  fields {
    c_map = "map<string, array<int>>"
    c_map_nest = "map<string, {c_int = int, c_string = string}>"
    c_array = "array<int>"
    c_string = string
    c_boolean = boolean
    c_tinyint = tinyint
    c_smallint = smallint
    c_int = int
    c_bigint = bigint
    c_float = float
    c_double = double
    c_decimal = "decimal(30, 8)"
    c_null = "null"
    c_bytes = bytes
    c_date = date
    c_timestamp = timestamp
    c_row = {
      c_map = "map<string, map<string, string>>"
      c_array = "array<int>"
      c_string = string
      c_boolean = boolean
      c_tinyint = tinyint
      c_smallint = smallint
      c_int = int
      c_bigint = bigint
      c_float = float
      c_double = double
      c_decimal = "decimal(30, 8)"
      c_null = "null"
      c_bytes = bytes
      c_date = date
      c_timestamp = timestamp
    }
  }
}
Random Generation
16 data matching the type are randomly generated
source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**
  FakeSource {
    row.num = 16
    schema = {
      fields {
        c_map = "map<string, string>"
        c_array = "array<int>"
        c_string = string
        c_boolean = boolean
        c_tinyint = tinyint
        c_smallint = smallint
        c_int = int
        c_bigint = bigint
        c_float = float
        c_double = double
        c_decimal = "decimal(30, 8)"
        c_null = "null"
        c_bytes = bytes
        c_date = date
        c_timestamp = timestamp
      }
    }
    result_table_name = "fake"
  }
}
Customize the data content Simple:
This is a self-defining data source information, defining whether each piece of data is an add or delete modification operation, and defining what each field stores
source {
  FakeSource {
    schema = {
      fields {
        c_map = "map<string, string>"
        c_array = "array<int>"
        c_string = string
        c_boolean = boolean
        c_tinyint = tinyint
        c_smallint = smallint
        c_int = int
        c_bigint = bigint
        c_float = float
        c_double = double
        c_decimal = "decimal(30, 8)"
        c_null = "null"
        c_bytes = bytes
        c_date = date
        c_timestamp = timestamp
      }
    }
    rows = [
      {
        kind = INSERT
        fields = [{"a": "b"}, [101], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
      }
      {
        kind = UPDATE_BEFORE
        fields = [{"a": "c"}, [102], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
      }
      {
        kind = UPDATE_AFTER
        fields = [{"a": "e"}, [103], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
      }
      {
        kind = DELETE
        fields = [{"a": "f"}, [104], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"]
      }
    ]
  }
}
Due to the constraints of the HOCON specification, users cannot directly create byte sequence objects. FakeSource uses strings to assign
bytestype values. In the example above, thebytestype field is assigned"bWlJWmo=", which is encoded from "miIZj" with base64. Hence, when assigning values tobytestype fields, please use strings encoded with base64.
Specified Data number Simple:
This case specifies the number of data generated and the length of the generated value
FakeSource {
  row.num = 10
  map.size = 10
  array.size = 10
  bytes.length = 10
  string.length = 10
  schema = {
    fields {
      c_map = "map<string, array<int>>"
      c_array = "array<int>"
      c_string = string
      c_boolean = boolean
      c_tinyint = tinyint
      c_smallint = smallint
      c_int = int
      c_bigint = bigint
      c_float = float
      c_double = double
      c_decimal = "decimal(30, 8)"
      c_null = "null"
      c_bytes = bytes
      c_date = date
      c_timestamp = timestamp
      c_row = {
        c_map = "map<string, map<string, string>>"
        c_array = "array<int>"
        c_string = string
        c_boolean = boolean
        c_tinyint = tinyint
        c_smallint = smallint
        c_int = int
        c_bigint = bigint
        c_float = float
        c_double = double
        c_decimal = "decimal(30, 8)"
        c_null = "null"
        c_bytes = bytes
        c_date = date
        c_timestamp = timestamp
      }
    }
  }
}
Template data Simple:
Randomly generated according to the specified template
Using template
FakeSource {
  row.num = 5
  string.fake.mode = "template"
  string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"]
  tinyint.fake.mode = "template"
  tinyint.template = [1, 2, 3, 4, 5, 6, 7, 8, 9]
  smalling.fake.mode = "template"
  smallint.template = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
  int.fake.mode = "template"
  int.template = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
  bigint.fake.mode = "template"
  bigint.template = [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
  float.fake.mode = "template"
  float.template = [40.0, 41.0, 42.0, 43.0]
  double.fake.mode = "template"
  double.template = [44.0, 45.0, 46.0, 47.0]
  schema {
    fields {
      c_string = string
      c_tinyint = tinyint
      c_smallint = smallint
      c_int = int
      c_bigint = bigint
      c_float = float
      c_double = double
    }
  }
}
Range data Simple:
The specified data generation range is randomly generated
FakeSource {
  row.num = 5
  string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"]
  tinyint.min = 1
  tinyint.max = 9
  smallint.min = 10
  smallint.max = 19
  int.min = 20
  int.max = 29
  bigint.min = 30
  bigint.max = 39
  float.min = 40.0
  float.max = 43.0
  double.min = 44.0
  double.max = 47.0
  schema {
    fields {
      c_string = string
      c_tinyint = tinyint
      c_smallint = smallint
      c_int = int
      c_bigint = bigint
      c_float = float
      c_double = double
    }
  }
}
Generate Multiple tables
This is a case of generating a multi-data source test.table1 and test.table2
FakeSource {
  tables_configs = [
    {
      row.num = 16
      schema {
        table = "test.table1"
        fields {
          c_string = string
          c_tinyint = tinyint
          c_smallint = smallint
          c_int = int
          c_bigint = bigint
          c_float = float
          c_double = double
        }
      }
    },
    {
      row.num = 17
      schema {
        table = "test.table2"
        fields {
          c_string = string
          c_tinyint = tinyint
          c_smallint = smallint
          c_int = int
          c_bigint = bigint
          c_float = float
          c_double = double
        }
      }
    }
  ]
}
Options rows Case
rows = [
  {
    kind = INSERT
    fields = [1, "A", 100]
  },
  {
    kind = UPDATE_BEFORE
    fields = [1, "A", 100]
  },
  {
    kind = UPDATE_AFTER
    fields = [1, "A_1", 100]
  },
  {
    kind = DELETE
    fields = [1, "A_1", 100]
  }
]
Options table-names Case
source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  FakeSource {
    table-names = ["test.table1", "test.table2", "test.table3"]
    parallelism = 1
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}
Changelog
2.2.0-beta 2022-09-26
- Add FakeSource Source Connector
2.3.0-beta 2022-10-20
- [Improve] Supports direct definition of data values(row) (2839)
- [Improve] Improve fake source connector: (2944)- Support user-defined map size
- Support user-defined array size
- Support user-defined string length
- Support user-defined bytes length
 
- [Improve] Support multiple splits for fake source connector (2974)
- [Improve] Supports setting the number of splits per parallelism and the reading interval between two splits (3098)