Skip to content

BigQuery Destinations

Bizon offers three BigQuery destination variants optimized for different workloads.

Terminal window
pip install bizon[bigquery]
VariantMethodBest ForLatency
bigqueryGCS Load JobsLarge batch loadsMinutes
bigquery_streamingLegacy StreamingLow-latency, small volumeSeconds
bigquery_streaming_v2Storage Write APIHigh-throughput streamingSeconds

Uses GCS as a staging area, then loads via BigQuery load jobs. Best for large batch workloads.

destination:
name: bigquery
config:
project_id: my-project
dataset_id: my_dataset
dataset_location: US
gcs_buffer_bucket: my-staging-bucket
gcs_buffer_format: parquet
buffer_size: 400
time_partitioning: DAY
FieldTypeRequiredDefaultDescription
project_idstringYes-GCP project ID
dataset_idstringYes-BigQuery dataset ID
dataset_locationstringNoUSDataset location
gcs_buffer_bucketstringYes-GCS bucket for staging
gcs_buffer_formatenumNoparquetStaging format: parquet or csv
time_partitioningenumNoDAYPartition granularity: DAY, HOUR, MONTH, YEAR
buffer_sizeintNo400Buffer size in MB

Uses the legacy streaming API. Good for low-latency with moderate volume.

destination:
name: bigquery_streaming
config:
project_id: my-project
dataset_id: my_dataset
dataset_location: US
buffer_size: 50
buffer_flush_timeout: 300
FieldTypeRequiredDefaultDescription
project_idstringYes-GCP project ID
dataset_idstringYes-BigQuery dataset ID
dataset_locationstringNoUSDataset location
buffer_sizeintNo50Buffer size in MB
buffer_flush_timeoutintNo600Max buffer age in seconds

Uses the Storage Write API for high-throughput streaming. Best for production real-time pipelines.

destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: my_dataset
dataset_location: US
buffer_size: 100
buffer_flush_timeout: 300
bq_max_rows_per_request: 5000
FieldTypeRequiredDefaultDescription
project_idstringYes-GCP project ID
dataset_idstringYes-BigQuery dataset ID
dataset_locationstringNoUSDataset location
buffer_sizeintNo50Buffer size in MB
buffer_flush_timeoutintNo600Max buffer age in seconds
bq_max_rows_per_requestintNo5000Max rows per API call (max 10000)
time_partitioning.typeenumNoDAYPartition type
time_partitioning.fieldstringNo_bizon_loaded_atPartition field

Provide a service account JSON key:

destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: analytics
authentication:
service_account_key: |
{
"type": "service_account",
"project_id": "my-project",
...
}

Or reference from environment:

authentication:
service_account_key: BIZON_ENV_GCP_SERVICE_ACCOUNT

If running on GCP (GKE, Cloud Run, etc.) or with gcloud auth, omit authentication:

destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: analytics
# No authentication section - uses ADC

Define schemas for structured loading with unnest: true:

destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: analytics
unnest: true
record_schemas:
- destination_id: my-project.analytics.events
record_schema:
- name: event_id
type: STRING
mode: REQUIRED
- name: event_type
type: STRING
mode: REQUIRED
- name: user_id
type: STRING
mode: NULLABLE
- name: payload
type: JSON
mode: NULLABLE
- name: created_at
type: TIMESTAMP
mode: REQUIRED
clustering_keys:
- event_type
- user_id
TypeDescription
STRINGText
INTEGER / INT6464-bit integer
FLOAT / FLOAT6464-bit float
BOOLEANTrue/false
TIMESTAMPDate and time
DATEDate only
DATETIMEDate and time (no timezone)
TIMETime only
JSONJSON object
BYTESBinary data
NUMERIC / BIGNUMERICExact numeric
GEOGRAPHYGeographic data
ModeDescription
NULLABLECan be null (default)
REQUIREDCannot be null
REPEATEDArray of values

Route records to different tables based on destination_id:

source:
name: kafka
stream: topic
topics:
- name: user-events
destination_id: my-project.analytics.user_events
- name: order-events
destination_id: my-project.analytics.order_events
destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: analytics
unnest: true
record_schemas:
- destination_id: my-project.analytics.user_events
record_schema:
- name: user_id
type: STRING
mode: REQUIRED
- name: action
type: STRING
mode: REQUIRED
- destination_id: my-project.analytics.order_events
record_schema:
- name: order_id
type: STRING
mode: REQUIRED
- name: amount
type: FLOAT64
mode: REQUIRED

Partition tables by time for better query performance:

destination:
name: bigquery
config:
project_id: my-project
dataset_id: analytics
gcs_buffer_bucket: my-bucket
time_partitioning: DAY # DAY, HOUR, MONTH, YEAR

Add clustering keys for better query performance:

record_schemas:
- destination_id: my-project.analytics.events
record_schema:
- name: event_type
type: STRING
mode: REQUIRED
- name: user_id
type: STRING
mode: NULLABLE
clustering_keys:
- event_type
- user_id
┌─────────────────────────────────────────────────────────────┐
│ Is latency critical? │
│ (< 1 minute required) │
└─────────────────────────────┬───────────────────────────────┘
┌───────────────┴───────────────┐
│ │
Yes No
│ │
▼ ▼
┌─────────────────────────────┐ ┌──────────────────────────┐
│ Is throughput high? │ │ bigquery │
│ (> 1000 records/sec) │ │ (GCS Load Jobs) │
└──────────────┬──────────────┘ └──────────────────────────┘
┌───────────┴───────────┐
│ │
Yes No
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ bigquery_ │ │ bigquery_streaming │
│ streaming_v2 │ │ (Legacy API) │
└─────────────────┘ └─────────────────────┘

Complete production example:

name: kafka-to-bigquery
source:
name: kafka
stream: topic
sync_mode: stream
bootstrap_servers: kafka:9092
topics:
- name: events
destination_id: my-project.analytics.events
destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: analytics
dataset_location: US
buffer_size: 100
buffer_flush_timeout: 300
bq_max_rows_per_request: 5000
unnest: true
time_partitioning:
type: DAY
field: _bizon_loaded_at
record_schemas:
- destination_id: my-project.analytics.events
record_schema:
- name: event_id
type: STRING
mode: REQUIRED
- name: event_type
type: STRING
mode: REQUIRED
- name: payload
type: JSON
mode: NULLABLE
- name: timestamp
type: TIMESTAMP
mode: REQUIRED
clustering_keys:
- event_type
authentication:
service_account_key: BIZON_ENV_GCP_SERVICE_ACCOUNT
engine:
backend:
type: postgres
config:
host: db.example.com
database: bizon
schema: public
runner:
type: stream
log_level: INFO