Version: Next

Hive

Hive sink connector

Description

Write data to Hive.

tip

In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9 and 3.1.3 .

If you use SeaTunnel Engine, You need put seatunnel-hadoop3-3.1.4-uber.jar and hive-exec-3.1.3.jar and libfb303-0.9.3.jar in $SEATUNNEL_HOME/lib/ dir.

Key features

By default, we use 2PC commit to ensure exactly-once

Options

name	type	required	default value
table_name	string	yes	-
metastore_uri	string	yes	-
compress_codec	string	no	none
hdfs_site_path	string	no	-
hive_site_path	string	no	-
hive.hadoop.conf	Map	no	-
hive.hadoop.conf-path	string	no	-
krb5_path	string	no	/etc/krb5.conf
kerberos_principal	string	no	-
kerberos_keytab_path	string	no	-
abort_drop_partition_metadata	boolean	no	true
parquet_avro_write_timestamp_as_int96	boolean	no	false
overwrite	boolean	no	false
data_save_mode	enum	no	APPEND_DATA

| schema_save_mode | enum | no | CREATE_SCHEMA_WHEN_NOT_EXIST | | save_mode_create_template | string | no | - | | common-options | | no | - |

table_name [string]

Target Hive table name eg: db1.table1, and if the source is multiple mode, you can use ${database_name}.${table_name} to generate the table name, it will replace the ${database_name} and ${table_name} with the value of the CatalogTable generate from the source.

metastore_uri [string]

Hive metastore uri

hdfs_site_path [string]

The path of hdfs-site.xml, used to load ha configuration of namenodes

hive_site_path [string]

The path of hive-site.xml

hive.hadoop.conf [map]

Properties in hadoop conf('core-site.xml', 'hdfs-site.xml', 'hive-site.xml')

hive.hadoop.conf-path [string]

The specified loading path for the 'core-site.xml', 'hdfs-site.xml', 'hive-site.xml' files

krb5_path [string]

The path of krb5.conf, used to authentication kerberos

The path of hive-site.xml, used to authentication hive metastore

kerberos_principal [string]

The principal of kerberos

kerberos_keytab_path [string]

The keytab path of kerberos

abort_drop_partition_metadata [boolean]

Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).

parquet_avro_write_timestamp_as_int96 [boolean]

Support writing Parquet INT96 from a timestamp, only valid for parquet files.

overwrite [boolean]

data_save_mode [enum]

Select how to handle existing data on the target before writing new data.

APPEND_DATA (default): Keep existing data and append new records.
DROP_DATA: Behaves the same as overwrite=true. Before commit, delete the existing data in the target path (for non-partitioned tables, delete the table directory; for partitioned tables, delete the related partition directories), then write new data.
CUSTOM_PROCESSING / ERROR_WHEN_DATA_EXISTS: Currently not recommended for Hive sink unless you have specific requirements.

Note: overwrite=true and data_save_mode=DROP_DATA are equivalent. Use either one; do not set both.

Flag to decide whether to use overwrite mode when inserting data into Hive. If set to true, for non-partitioned tables, the existing data in the table will be deleted before inserting new data. For partitioned tables, the data in the relevant partition will be deleted before inserting new data.

schema_save_mode [enum]

Before starting the synchronization task, different processing schemes are selected for the existing table structure on the target side.

Default value: CREATE_SCHEMA_WHEN_NOT_EXIST

Option values:

RECREATE_SCHEMA: Will create when the table does not exist, delete and rebuild when the table exists
CREATE_SCHEMA_WHEN_NOT_EXIST: Will create when the table does not exist, skip when the table exists
ERROR_WHEN_SCHEMA_NOT_EXIST: Error will be reported when the table does not exist
IGNORE: Ignore the treatment of the table

save_mode_create_template [string]

We use templates to automatically create Hive tables, which will create corresponding table creation statements based on the type of upstream data and schema type, and the default template can be modified according to the situation. Available template variables: ${database}, ${table}, ${rowtype_fields}, ${rowtype_partition_fields}, ${table_location}.

Default value: When not specified, uses a default PARQUET non-partitioned table template:

CREATE TABLE IF NOT EXISTS `${database}`.`${table}` (
  ${rowtype_fields}
)
STORED AS PARQUET
LOCATION '${table_location}'

common options

Sink plugin common parameters, please refer to Sink Common Options for details

Example

  Hive {
    table_name = "default.seatunnel_orc"
    metastore_uri = "thrift://namenode001:9083"
  }

example 1

We have a source table like this:

create table test_hive_source(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING);

We need read data from the source table and write to another table:

create table test_hive_sink_text_simple(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING);

The job config file can like this:

env {
  parallelism = 3
  job.name="test_hive_source_to_hive"
}

source {
  Hive {
    table_name = "test_hive.test_hive_source"
    metastore_uri = "thrift://ctyun7:9083"
  }
}

sink {
  # choose stdout output plugin to output data to console

  Hive {
    table_name = "test_hive.test_hive_sink_text_simple"
    metastore_uri = "thrift://ctyun7:9083"
    hive.hadoop.conf = {
      bucket = "s3a://mybucket"
      fs.s3a.aws.credentials.provider="com.amazonaws.auth.InstanceProfileCredentialsProvider"
    }
}

example2: Kerberos

sink {
  Hive {
    table_name = "default.test_hive_sink_on_hdfs_with_kerberos"
    metastore_uri = "thrift://metastore:9083"
    hive_site_path = "/tmp/hive-site.xml"
    kerberos_principal = "hive/metastore.seatunnel@EXAMPLE.COM"
    kerberos_keytab_path = "/tmp/hive.keytab"
    krb5_path = "/tmp/krb5.conf"
  }
}

Description:

hive_site_path: The path to the hive-site.xml file.
kerberos_principal: The principal for Kerberos authentication.
kerberos_keytab_path: The keytab file path for Kerberos authentication.
krb5_path: The path to the krb5.conf file used for Kerberos authentication.

Run the case:

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    schema = {
      fields {
        pk_id = bigint
        name = string
        score = int
      }
      primaryKey {
        name = "pk_id"
        columnNames = [pk_id]
      }
    }
    rows = [
      {
        kind = INSERT
        fields = [1, "A", 100]
      },
      {
        kind = INSERT
        fields = [2, "B", 100]
      },
      {
        kind = INSERT
        fields = [3, "C", 100]
      }
    ]
  }
}

sink {
  Hive {
    table_name = "default.test_hive_sink_on_hdfs_with_kerberos"
    metastore_uri = "thrift://metastore:9083"
    hive_site_path = "/tmp/hive-site.xml"
    kerberos_principal = "hive/metastore.seatunnel@EXAMPLE.COM"
    kerberos_keytab_path = "/tmp/hive.keytab"
    krb5_path = "/tmp/krb5.conf"
  }
}

Hive on s3

Step 1

Create the lib dir for hive of emr.

mkdir -p ${SEATUNNEL_HOME}/plugins/Hive/lib

Step 2

Get the jars from maven center to the lib.

cd ${SEATUNNEL_HOME}/plugins/Hive/lib
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.6.5/hadoop-aws-2.6.5.jar
wget https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.9/hive-exec-2.3.9.jar

Step 3

Copy the jars from your environment on emr to the lib dir.

cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-2.60.0.jar ${SEATUNNEL_HOME}/plugins/Hive/lib
cp /usr/share/aws/emr/hadoop-state-pusher/lib/hadoop-common-3.3.6-amzn-1.jar ${SEATUNNEL_HOME}/plugins/Hive/lib
cp /usr/share/aws/emr/hadoop-state-pusher/lib/javax.inject-1.jar ${SEATUNNEL_HOME}/plugins/Hive/lib
cp /usr/share/aws/emr/hadoop-state-pusher/lib/aopalliance-1.0.jar ${SEATUNNEL_HOME}/plugins/Hive/lib

Step 4

Run the case.

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    schema = {
      fields {
        pk_id = bigint
        name = string
        score = int
      }
      primaryKey {
        name = "pk_id"
        columnNames = [pk_id]
      }
    }
    rows = [
      {
        kind = INSERT
        fields = [1, "A", 100]
      },
      {
        kind = INSERT
        fields = [2, "B", 100]
      },
      {
        kind = INSERT
        fields = [3, "C", 100]
      }
    ]
  }
}

sink {
  Hive {
    table_name = "test_hive.test_hive_sink_on_s3"
    metastore_uri = "thrift://ip-192-168-0-202.cn-north-1.compute.internal:9083"
    hive.hadoop.conf-path = "/home/ec2-user/hadoop-conf"
    hive.hadoop.conf = {
       bucket="s3://ws-package"
       fs.s3a.aws.credentials.provider="com.amazonaws.auth.InstanceProfileCredentialsProvider"
    }
  }
}

Hive on oss

Step 1

Create the lib dir for hive of emr.

mkdir -p ${SEATUNNEL_HOME}/plugins/Hive/lib

Step 2

Get the jars from maven center to the lib.

cd ${SEATUNNEL_HOME}/plugins/Hive/lib
wget https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.9/hive-exec-2.3.9.jar

Step 3

Copy the jars from your environment on emr to the lib dir and delete the conflicting jar.

cp -r /opt/apps/JINDOSDK/jindosdk-current/lib/jindo-*.jar ${SEATUNNEL_HOME}/plugins/Hive/lib
rm -f ${SEATUNNEL_HOME}/lib/hadoop-aliyun-*.jar

Step 4

Run the case.

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    schema = {
      fields {
        pk_id = bigint
        name = string
        score = int
      }
      primaryKey {
        name = "pk_id"
        columnNames = [pk_id]
      }
    }
    rows = [
      {
        kind = INSERT
        fields = [1, "A", 100]
      },
      {
        kind = INSERT
        fields = [2, "B", 100]
      },
      {
        kind = INSERT
        fields = [3, "C", 100]
      }
    ]
  }
}

sink {
  Hive {
    table_name = "test_hive.test_hive_sink_on_oss"
    metastore_uri = "thrift://master-1-1.c-1009b01725b501f2.cn-wulanchabu.emr.aliyuncs.com:9083"
    hive.hadoop.conf-path = "/tmp/hadoop"
    hive.hadoop.conf = {
        bucket="oss://emr-osshdfs.cn-wulanchabu.oss-dls.aliyuncs.com"
    }
  }
}

example 2

We have multiple source table like this:

create table test_1(
)
PARTITIONED BY (xx);

create table test_2(
)
PARTITIONED BY (xx);
...

We need read data from these source tables and write to another tables:

The job config file can like this:

env {
  # You can set flink configuration here
  parallelism = 3
  job.name="test_hive_source_to_hive"
}

source {
  Hive {
    tables_configs = [
      {
        table_name = "test_hive.test_1"
        metastore_uri = "thrift://ctyun6:9083"
      },
      {
        table_name = "test_hive.test_2"
        metastore_uri = "thrift://ctyun7:9083"
      }
    ]
  }
}

sink {
  # choose stdout output plugin to output data to console
  Hive {
    table_name = "${database_name}.${table_name}"
    metastore_uri = "thrift://ctyun7:9083"
  }
}

Auto Table Creation Examples

Example 1: Basic Auto Table Creation

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    schema = {
      fields {
        id = bigint
        name = string
        department = string
        salary = decimal(10,2)
        hire_date = date
      }
    }
    rows = [
      {
        kind = INSERT
        fields = [1, "John Doe", "Engineering", 75000.50, "2022-01-15"]
      }
    ]
  }
}

sink {
  Hive {
    table_name = "warehouse.employees"
    metastore_uri = "thrift://metastore:9083"
    schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
    save_mode_create_template = """
      CREATE TABLE IF NOT EXISTS `${database}`.`${table}` (
        ${rowtype_fields}
      )
      PARTITIONED BY (
        department string COMMENT 'Department partition'
      )
      STORED AS PARQUET
      LOCATION '${table_location}'
      TBLPROPERTIES (
        'seatunnel.creation.mode' = 'template'
      )
    """
  }
}

Changelog

Change Log

Change	Commit	Version
[Feature][File] Add markdown parser #9714	https://github.com/apache/seatunnel/commit/8b3c07844	dev
[Improve][API] Optimize the enumerator API semantics and reduce lock calls at the connector level (#9671)	https://github.com/apache/seatunnel/commit/9212a77140	2.3.12
[Feature][connector-hive] hive sink connector support overwrite mode #7843 (#7891)	https://github.com/apache/seatunnel/commit/6fafe6f4d3	2.3.12
[Fix][Connector-V2] Fix hive client thread unsafe (#9282)	https://github.com/apache/seatunnel/commit/5dc25897a9	2.3.11
[improve] update file connectors config (#9034)	https://github.com/apache/seatunnel/commit/8041d59dc2	2.3.11
[Improve] Refactor file enumerator to prevent duplicate put split (#8989)	https://github.com/apache/seatunnel/commit/fdf1beae9c	2.3.11
Revert " [improve] update localfile connector config" (#9018)	https://github.com/apache/seatunnel/commit/cdc79e13ad	2.3.10
[improve] update localfile connector config (#8765)	https://github.com/apache/seatunnel/commit/def369a85f	2.3.10
[Improve][connector-hive] Improved hive file allocation algorithm for subtasks (#8876)	https://github.com/apache/seatunnel/commit/89d1878ade	2.3.10
[Improve] restruct connector common options (#8634)	https://github.com/apache/seatunnel/commit/f3499a6eeb	2.3.10
[Fix][Hive] Writing parquet files supports the optional timestamp int96 (#8509)	https://github.com/apache/seatunnel/commit/856aea1952	2.3.10
[Fix] Set all snappy dependency use one version (#8423)	https://github.com/apache/seatunnel/commit/3ac977c8d3	2.3.9
[Fix][Connector-V2] Fix hive krb5 path not work (#8228)	https://github.com/apache/seatunnel/commit/e18a4d07b4	2.3.9
[Improve][dist]add shade check rule (#8136)	https://github.com/apache/seatunnel/commit/51ef800016	2.3.9
[Feature][File] Support config null format for text file read (#8109)	https://github.com/apache/seatunnel/commit/2dbf02df47	2.3.9
[Improve][API] Unified tables_configs and table_list (#8100)	https://github.com/apache/seatunnel/commit/84c0b8d660	2.3.9
[Feature][Core] Rename `result_table_name`/`source_table_name` to `plugin_input/plugin_output` (#8072)	https://github.com/apache/seatunnel/commit/c7bbd322db	2.3.9
[Feature][E2E] Add hive3 e2e test case (#8003)	https://github.com/apache/seatunnel/commit/9a24fac2c4	2.3.9
[Improve][Connector-V2] Change File Read/WriteStrategy `setSeaTunnelRowTypeInfo` to `setCatalogTable` (#7829)	https://github.com/apache/seatunnel/commit/6b5f74e524	2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)	https://github.com/apache/seatunnel/commit/6b7c53d03c	2.3.9
[Improve][Zeta] Split the classloader of task group (#7580)	https://github.com/apache/seatunnel/commit/3be0d1cc61	2.3.8
[Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131)	https://github.com/apache/seatunnel/commit/c4ca74122c	2.3.6
[Improve][Hive] Close resources when exception occurs (#7205)	https://github.com/apache/seatunnel/commit/561171528b	2.3.6
[Hotfix][Hive Connector] Fix Hive hdfs-site.xml and hive-site.xml not be load error (#7069)	https://github.com/apache/seatunnel/commit/c23a577f34	2.3.6
Fix hive load hive_site_path and hdfs_site_path too late (#7017)	https://github.com/apache/seatunnel/commit/e2578a5b4d	2.3.6
[Bug][connector-hive] Eanble login with kerberos for hive (#6893)	https://github.com/apache/seatunnel/commit/26e433e472	2.3.6
[Feature][S3 File] Make S3 File Connector support multiple table write (#6698)	https://github.com/apache/seatunnel/commit/8f2049b2f1	2.3.6
[Feature] Hive Source/Sink support multiple table (#5929)	https://github.com/apache/seatunnel/commit/4d9287fce4	2.3.6
[Improve][Hive] udpate hive3 version (#6699)	https://github.com/apache/seatunnel/commit/1184c05c29	2.3.6
[HiveSink]Fix the risk of resource leakage. (#6721)	https://github.com/apache/seatunnel/commit/c23804f13b	2.3.6
[Improve][Connector-v2] The hive connector support multiple filesystem (#6648)	https://github.com/apache/seatunnel/commit/8a4c01fe35	2.3.6
[Fix][Connector-V2] Fix add hive partition error when partition already existed (#6577)	https://github.com/apache/seatunnel/commit/2a0a0b9d19	2.3.5
Fix HiveMetaStoreProxy#enableKerberos will return true if doesn't enable kerberos (#6307)	https://github.com/apache/seatunnel/commit/1dad6f7061	2.3.4
[Feature][Engine] Unify job env parameters (#6003)	https://github.com/apache/seatunnel/commit/2410ab38f0	2.3.4
[Refactor][File Connector] Put Multiple Table File API to File Base Module (#6033)	https://github.com/apache/seatunnel/commit/c324d663b4	2.3.4
Support using multiple hadoop account (#5903)	https://github.com/apache/seatunnel/commit/d69d88d1aa	2.3.4
[Improve][Common] Introduce new error define rule (#5793)	https://github.com/apache/seatunnel/commit/9d1b2582b2	2.3.4
Support config column/primaryKey/constraintKey in schema (#5564)	https://github.com/apache/seatunnel/commit/eac76b4e50	2.3.4
[Hotfix][Connector-V2][Hive] fix the bug that hive-site.xml can not be injected in HiveConf (#5261)	https://github.com/apache/seatunnel/commit/04ce22ac1e	2.3.4
[Improve][Connector-v2][HiveSink]remove drop partition when abort. (#4940)	https://github.com/apache/seatunnel/commit/edef87b523	2.3.3
[feature][web] hive add option because web need (#5154)	https://github.com/apache/seatunnel/commit/5e1511ff0d	2.3.3
[Hotfix][Connector-V2][Hive] Support user-defined hive-site.xml (#4965)	https://github.com/apache/seatunnel/commit/2a064bcdb0	2.3.3
Change file type to file_format_type in file source/sink (#4249)	https://github.com/apache/seatunnel/commit/973a2fae3c	2.3.1
[hotfix] fixed schema options import error	https://github.com/apache/seatunnel/commit/656805f2df	2.3.1
[chore] Code format with spotless plugin.	https://github.com/apache/seatunnel/commit/291214ad6f	2.3.1
Merge branch 'dev' into merge/cdc	https://github.com/apache/seatunnel/commit/4324ee1912	2.3.1
[Improve][Project] Code format with spotless plugin.	https://github.com/apache/seatunnel/commit/423b583038	2.3.1
[Imprve][Connector-V2][Hive] Support read text table & Column projection (#4105)	https://github.com/apache/seatunnel/commit/717620f542	2.3.1
[Hotfix][Connector-V2][Hive] Fix hive unknownhost (#4141)	https://github.com/apache/seatunnel/commit/f1a1dfe4af	2.3.1
[Improve][build] Give the maven module a human readable name (#4114)	https://github.com/apache/seatunnel/commit/d7cd601051	2.3.1
[Improve][Project] Code format with spotless plugin. (#4101)	https://github.com/apache/seatunnel/commit/a2ab166561	2.3.1
[Improve][Connector-V2][Hive] Support assign partitions (#3842)	https://github.com/apache/seatunnel/commit/6a4a850b4c	2.3.1
[Improve][Connector-V2][Hive] Improve config check logic (#3886)	https://github.com/apache/seatunnel/commit/b4348f6f44	2.3.1
[Feature][Connector-V2] Support kerberos in hive and hdfs file connector (#3840)	https://github.com/apache/seatunnel/commit/055ad9d836	2.3.1
[Feature][Connector] add get source method to all source connector (#3846)	https://github.com/apache/seatunnel/commit/417178fb84	2.3.1
[Improve][Connector-V2] The log outputs detailed exception stack information (#3805)	https://github.com/apache/seatunnel/commit/d0c6217f27	2.3.1
[Feature][Shade] Add seatunnel hadoop3 uber (#3755)	https://github.com/apache/seatunnel/commit/5a024bdf8f	2.3.0
[Feature][Connector-V2][File] Optimize filesystem utils (#3749)	https://github.com/apache/seatunnel/commit/ac4e880fb5	2.3.0
[Hotfix][OptionRule] Fix option rule about all connectors (#3592)	https://github.com/apache/seatunnel/commit/226dc6a119	2.3.0
[Hotfix][Connector-V2][Hive] Fix npe of getting file system (#3506)	https://github.com/apache/seatunnel/commit/e1fc3d1b01	2.3.0
[Improve][Connector-V2][Hive] Unified exceptions for hive source & sink connector (#3541)	https://github.com/apache/seatunnel/commit/12c0fb91d2	2.3.0
[Feature][Connector-V2][File] Add option and factory for file connectors (#3375)	https://github.com/apache/seatunnel/commit/db286e8631	2.3.0
[Hotfix][Connector-V2][Hive] Fix the bug that when write data to hive throws NullPointerException (#3258)	https://github.com/apache/seatunnel/commit/777bf6b42e	2.3.0
[Improve][Connector-V2][Hive] Hive Sink Support msck partitions (#3133)	https://github.com/apache/seatunnel/commit/a8738ef3c4	2.3.0-beta
unify `flatten-maven-plugin` version (#3078)	https://github.com/apache/seatunnel/commit/ed743fddcc	2.3.0-beta
[Engine][Merge] fix merge problem	https://github.com/apache/seatunnel/commit/0e9ceeefc9	2.3.0-beta
Merge remote-tracking branch 'upstream/dev' into st-engine	https://github.com/apache/seatunnel/commit/ca80df779a	2.3.0-beta
update hive.metastore.version to hive.exec.version (#2879)	https://github.com/apache/seatunnel/commit/018ee0a3db	2.2.0-beta
[Bug][Connector-V2] Fix hive sink bug (#2870)	https://github.com/apache/seatunnel/commit/d661fa011e	2.2.0-beta
[Fix][Connector-V2] Fix HiveSource Connector read orc table error (#2845)	https://github.com/apache/seatunnel/commit/61720306e7	2.2.0-beta
[Bug][Connector-V2] Fix hive source text table name (#2797)	https://github.com/apache/seatunnel/commit/563637ebd1	2.2.0-beta
[Improve][Connector-V2] Refactor hive source & sink connector (#2708)	https://github.com/apache/seatunnel/commit/a357dca365	2.2.0-beta
[DEV][Api] Replace SeaTunnelContext with JobContext and remove singleton pattern (#2706) (#2731)	https://github.com/apache/seatunnel/commit/e8929ab605	2.3.0-beta
[DEV][Api] Replace SeaTunnelContext with JobContext and remove singleton pattern (#2706)	https://github.com/apache/seatunnel/commit/cbf82f755c	2.2.0-beta
[#2606]Dependency management split (#2630)	https://github.com/apache/seatunnel/commit/fc047be69b	2.2.0-beta
[Improve][Connector-V2] Refactor the package of hdfs file connector (#2402)	https://github.com/apache/seatunnel/commit/87d0624c5b	2.2.0-beta
[Feature][Connector-V2] Add orc file support in connector hive sink (#2311) (#2374)	https://github.com/apache/seatunnel/commit/81cb80c050	2.2.0-beta
[improve][UT] Upgrade junit to 5.+ (#2305)	https://github.com/apache/seatunnel/commit/362319ff3e	2.2.0-beta
Decide table format using outputFormat in HiveSinkConfig #2303	https://github.com/apache/seatunnel/commit/3a2586f6dc	2.2.0-beta
[Feature][Connector-V2-Hive] Add parquet file format support to Hive Sink (#2310)	https://github.com/apache/seatunnel/commit/4ab3c21b8d	2.2.0-beta
Add BaseHiveCommitInfo for common hive commit info (#2306)	https://github.com/apache/seatunnel/commit/0d2f6f4d7c	2.2.0-beta
Remove same code to independent method in HiveSinkWriter (#2307)	https://github.com/apache/seatunnel/commit/e99e6ee726	2.2.0-beta
Avoid potential null pointer risk in HiveSinkWriter#snapshotState (#2302)	https://github.com/apache/seatunnel/commit/e7d817f7d2	2.2.0-beta
[Connector-V2] Add file type check logic in hive connector (#2275)	https://github.com/apache/seatunnel/commit/5488337c67	2.2.0-beta
[Connector-V2] Add parquet file reader for Hive Source Connector (#2199) (#2237)	https://github.com/apache/seatunnel/commit/59db97ed34	2.2.0-beta
Merge from dev to st-engine (#2243)	https://github.com/apache/seatunnel/commit/41e530afd5	2.3.0-beta
StateT of SeaTunnelSource should extend `Serializable` (#2214)	https://github.com/apache/seatunnel/commit/8c426ef850	2.2.0-beta
[Bug][connector-hive] filter '_SUCCESS' file in file list (#2235) (#2236)	https://github.com/apache/seatunnel/commit/db04651523	2.2.0-beta
[Bug][hive-connector-v2] Resolve the schema inconsistency bug (#2229) (#2230)	https://github.com/apache/seatunnel/commit/62ca075915	2.2.0-beta
[Bug][spark-connector-v2-example] fix the bug of no class found. (#2191) (#2192)	https://github.com/apache/seatunnel/commit/5dbc2df17e	2.2.0-beta
[Connector-V2] Add Hive sink connector v2 (#2158)	https://github.com/apache/seatunnel/commit/23ad4ee735	2.2.0-beta
[Connector-V2] Add File Sink Connector (#2117)	https://github.com/apache/seatunnel/commit/e2283da64f	2.2.0-beta
[Connector-V2]Hive Source (#2123)	https://github.com/apache/seatunnel/commit/ffcf3f59e2	2.2.0-beta
[api-draft][Optimize] Optimize module name (#2062)	https://github.com/apache/seatunnel/commit/f79e3112b1	2.2.0-beta

Hive

Description​

Key features​

Options​

table_name [string]​

metastore_uri [string]​

hdfs_site_path [string]​

hive_site_path [string]​

hive.hadoop.conf [map]​

hive.hadoop.conf-path [string]​

krb5_path [string]​

kerberos_principal [string]​

kerberos_keytab_path [string]​

abort_drop_partition_metadata [boolean]​

parquet_avro_write_timestamp_as_int96 [boolean]​

overwrite [boolean]​

data_save_mode [enum]​

schema_save_mode [enum]​

save_mode_create_template [string]​

common options​

Example​

example 1​

example2: Kerberos​

Hive on s3​

Step 1​

Step 2​

Step 3​

Step 4​

Hive on oss​

Step 1​

Step 2​

Step 3​

Step 4​

example 2​

Auto Table Creation Examples​

Example 1: Basic Auto Table Creation​

Changelog​

Description

Key features

Options

table_name [string]

metastore_uri [string]

hdfs_site_path [string]

hive_site_path [string]

hive.hadoop.conf [map]

hive.hadoop.conf-path [string]

krb5_path [string]

kerberos_principal [string]

kerberos_keytab_path [string]

abort_drop_partition_metadata [boolean]

parquet_avro_write_timestamp_as_int96 [boolean]

overwrite [boolean]

data_save_mode [enum]

schema_save_mode [enum]

save_mode_create_template [string]

common options

Example

example 1

example2: Kerberos

Hive on s3

Step 1

Step 2

Step 3

Step 4

Hive on oss

Step 1

Step 2

Step 3

Step 4

example 2

Auto Table Creation Examples

Example 1: Basic Auto Table Creation

Changelog