Skip to main content
Version: Next

GoogleBigtable

Google Bigtable sink connector

Description

Writes data to Google Cloud Bigtable using the native Bigtable Data v2 Java client.

Key features

Options

nametyperequireddefault value
project_idstringyes-
instance_idstringyes-
tablestringyes-
rowkey_columnlistyes-
column_familyconfigyes-
credentials_pathstringno-
rowkey_delimiterstringno""
version_columnstringno-
null_modestringnoskip
batch_mutation_sizeintno100
common-optionsno-

project_id [string]

Google Cloud project ID. Example: "my-gcp-project"

instance_id [string]

Bigtable instance ID. Example: "my-bigtable-instance"

table [string]

The Bigtable table name to write to. Example: "my-table"

rowkey_column [list]

Column names used to compose the Bigtable row key. Example: ["id"] or ["tenant", "id"].

When multiple columns are specified they are joined with rowkey_delimiter.

column_family [config]

Mapping from column name to column family name. Use all_columns as key to set a default family for all unmapped columns.

column_family {
name = "info"
age = "stats"
}

or to put everything in one family:

column_family {
all_columns = "cf"
}

credentials_path [string]

Path to the Google Cloud service account JSON key file.

If not set, Application Default Credentials (ADC) will be used — this works automatically on GCE/GKE or when GOOGLE_APPLICATION_CREDENTIALS is set in the environment.

rowkey_delimiter [string]

Delimiter used to join multiple row-key column values. Default is "" (empty string, no delimiter).

version_column [string]

Column name whose BIGINT value is used as the Bigtable cell timestamp (microseconds since epoch). If not set, the current system time is used.

null_mode [string]

How to handle null field values. Supported: skip (default), empty.

  • skip — the cell is omitted from the mutation
  • empty — an empty byte array is written to the cell

batch_mutation_size [int]

Number of row mutations to accumulate before sending a BulkMutation to Bigtable. Default is 100. Increase for higher throughput at the cost of higher per-task memory usage.

common options

Sink plugin common parameters, please refer to Sink Common Options for details.

Data Types

All SeaTunnel types are supported:

SeaTunnel typeStorage format in Bigtable
TINYINT1-byte binary
SMALLINT2-byte big-endian binary
INT4-byte big-endian binary
BIGINT8-byte big-endian binary
FLOAT4-byte IEEE 754 big-endian
DOUBLE8-byte IEEE 754 big-endian
BOOLEAN1-byte (1 = true, 0 = false)
BYTESRaw bytes
STRINGUTF-8 text
DECIMALUTF-8 plain string
DATEUTF-8 yyyy-MM-dd
TIMEUTF-8 HH:mm:ss
TIMESTAMPUTF-8 yyyy-MM-dd HH:mm:ss

Example

Basic — Application Default Credentials

sink {
GoogleBigtable {
project_id = "my-gcp-project"
instance_id = "my-bigtable-instance"
table = "events"
rowkey_column = ["event_id"]
column_family {
all_columns = "cf"
}
}
}

Service Account Key File

sink {
GoogleBigtable {
project_id = "my-gcp-project"
instance_id = "my-bigtable-instance"
table = "events"
credentials_path = "/secrets/sa-key.json"
rowkey_column = ["tenant_id", "event_id"]
rowkey_delimiter = "#"
column_family {
all_columns = "data"
}
batch_mutation_size = 500
}
}

Multiple Column Families

sink {
GoogleBigtable {
project_id = "my-gcp-project"
instance_id = "my-bigtable-instance"
table = "user_profile"
rowkey_column = ["user_id"]
column_family {
name = "identity"
email = "identity"
age = "stats"
last_login = "stats"
}
}
}

Changelog

Change Log
ChangeCommitVersion
[Feature][Connector-V2] Add Google Cloud Bigtable Source and Sink connectorhttps://github.com/apache/seatunnel/commit/8e57c04dev