Version: Next

GoogleBigtable

Google Bigtable sink connector

Description

Writes data to Google Cloud Bigtable using the native Bigtable Data v2 Java client.

Key features

Options

name	type	required	default value
project_id	string	yes	-
instance_id	string	yes	-
table	string	yes	-
rowkey_column	list	yes	-
column_family	config	yes	-
credentials_path	string	no	-
rowkey_delimiter	string	no	""
version_column	string	no	-
null_mode	string	no	skip
batch_mutation_size	int	no	100
schema_save_mode	enum	no	RECREATE_SCHEMA
data_save_mode	enum	no	APPEND_DATA
multi_table_sink_replica	int	no	-
common-options		no	-

project_id [string]

Google Cloud project ID. Example: "my-gcp-project"

instance_id [string]

Bigtable instance ID. Example: "my-bigtable-instance"

table [string]

The Bigtable table name to write to. Example: "my-table"

rowkey_column [list]

Column names used to compose the Bigtable row key. Example: ["id"] or ["tenant", "id"].

When multiple columns are specified they are joined with rowkey_delimiter.

column_family [config]

Mapping from column name to column family name. Use all_columns as key to set a default family for all unmapped columns.

column_family {
  name = "info"
  age  = "stats"
}

or to put everything in one family:

column_family {
  all_columns = "cf"
}

credentials_path [string]

Path to the Google Cloud service account JSON key file.

If not set, Application Default Credentials (ADC) will be used — this works automatically on GCE/GKE or when GOOGLE_APPLICATION_CREDENTIALS is set in the environment.

rowkey_delimiter [string]

Delimiter used to join multiple row-key column values. Default is "" (empty string, no delimiter).

version_column [string]

Column name whose BIGINT value is used as the Bigtable cell timestamp (microseconds since epoch). If not set, the current system time is used.

null_mode [string]

How to handle null field values. Supported: skip (default), empty.

skip — the cell is omitted from the mutation
empty — an empty byte array is written to the cell

batch_mutation_size [int]

Number of row mutations to accumulate before sending a BulkMutation to Bigtable. Default is 100. Increase for higher throughput at the cost of higher per-task memory usage.

schema_save_mode [enum]

Schema save mode. Only RECREATE_SCHEMA is supported now.

The connector does not create Bigtable tables or column families. Create the target table and all column families before the job starts.

data_save_mode [enum]

Data save mode. Only APPEND_DATA is supported now.

DROP_DATA and ERROR_WHEN_DATA_EXISTS are not implemented for this connector. If you need a clean target, truncate or recreate the Bigtable table before running the job.

multi_table_sink_replica [int]

The number of sink replicas used for multi-table writing. For details, see Sink Common Options.

common options

Sink plugin common parameters, please refer to Sink Common Options for details.

Data Types

All SeaTunnel types are supported:

SeaTunnel type	Storage format in Bigtable
TINYINT	1-byte binary
SMALLINT	2-byte big-endian binary
INT	4-byte big-endian binary
BIGINT	8-byte big-endian binary
FLOAT	4-byte IEEE 754 big-endian
DOUBLE	8-byte IEEE 754 big-endian
BOOLEAN	1-byte (1 = true, 0 = false)
BYTES	Raw bytes
STRING	UTF-8 text
DECIMAL	UTF-8 plain string
DATE	UTF-8 `yyyy-MM-dd`
TIME	UTF-8 `HH:mm:ss`
TIMESTAMP	UTF-8 `yyyy-MM-dd HH:mm:ss`

tip

Bigtable does not have relational columns. The sink writes every non-row-key field as a Bigtable cell. The target column family is selected by column_family; the Bigtable qualifier is the SeaTunnel field name.

Example

Basic — Application Default Credentials

sink {
  GoogleBigtable {
    project_id  = "my-gcp-project"
    instance_id = "my-bigtable-instance"
    table       = "events"
    rowkey_column = ["event_id"]
    column_family {
      all_columns = "cf"
    }
  }
}

Service Account Key File

sink {
  GoogleBigtable {
    project_id       = "my-gcp-project"
    instance_id      = "my-bigtable-instance"
    table            = "events"
    credentials_path = "/secrets/sa-key.json"
    rowkey_column    = ["tenant_id", "event_id"]
    rowkey_delimiter = "#"
    column_family {
      all_columns = "data"
    }
    batch_mutation_size = 500
  }
}

Multiple Column Families

sink {
  GoogleBigtable {
    project_id  = "my-gcp-project"
    instance_id = "my-bigtable-instance"
    table       = "user_profile"
    rowkey_column = ["user_id"]
    column_family {
      name        = "identity"
      email       = "identity"
      age         = "stats"
      last_login  = "stats"
    }
  }
}

Use a version column and empty null values

sink {
  GoogleBigtable {
    project_id       = "my-gcp-project"
    instance_id      = "my-bigtable-instance"
    table            = "events"
    rowkey_column    = ["tenant_id", "event_id"]
    rowkey_delimiter = "#"
    version_column   = "event_ts"
    null_mode        = "empty"
    column_family {
      all_columns = "data"
      event_type  = "meta"
    }
  }
}

Changelog

Change Log

Change	Commit	Version
[Feature][Connector-V2] Add Google Cloud Bigtable Source and Sink connector	https://github.com/apache/seatunnel/commit/8e57c04	dev

GoogleBigtable

Description​

Key features​

Options​

project_id [string]​

instance_id [string]​

table [string]​

rowkey_column [list]​

column_family [config]​

credentials_path [string]​

rowkey_delimiter [string]​

version_column [string]​

null_mode [string]​

batch_mutation_size [int]​

schema_save_mode [enum]​

data_save_mode [enum]​

multi_table_sink_replica [int]​

common options​

Data Types​

Example​

Basic — Application Default Credentials​

Service Account Key File​

Multiple Column Families​

Use a version column and empty null values​

Changelog​