BigQuery Destinations
Bizon offers three BigQuery destination variants optimized for different workloads.
Installation
Section titled “Installation”pip install bizon[bigquery]Destination Variants
Section titled “Destination Variants”| Variant | Method | Best For | Latency |
|---|---|---|---|
bigquery | GCS Load Jobs | Large batch loads | Minutes |
bigquery_streaming | Legacy Streaming | Low-latency, small volume | Seconds |
bigquery_streaming_v2 | Storage Write API | High-throughput streaming | Seconds |
BigQuery (Batch Load)
Section titled “BigQuery (Batch Load)”Uses GCS as a staging area, then loads via BigQuery load jobs. Best for large batch workloads.
destination: name: bigquery config: project_id: my-project dataset_id: my_dataset dataset_location: US gcs_buffer_bucket: my-staging-bucket gcs_buffer_format: parquet buffer_size: 400 time_partitioning: DAYConfiguration
Section titled “Configuration”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
project_id | string | Yes | - | GCP project ID |
dataset_id | string | Yes | - | BigQuery dataset ID |
dataset_location | string | No | US | Dataset location |
gcs_buffer_bucket | string | Yes | - | GCS bucket for staging |
gcs_buffer_format | enum | No | parquet | Staging format: parquet or csv |
time_partitioning | enum | No | DAY | Partition granularity: DAY, HOUR, MONTH, YEAR |
buffer_size | int | No | 400 | Buffer size in MB |
BigQuery Streaming
Section titled “BigQuery Streaming”Uses the legacy streaming API. Good for low-latency with moderate volume.
destination: name: bigquery_streaming config: project_id: my-project dataset_id: my_dataset dataset_location: US buffer_size: 50 buffer_flush_timeout: 300Configuration
Section titled “Configuration”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
project_id | string | Yes | - | GCP project ID |
dataset_id | string | Yes | - | BigQuery dataset ID |
dataset_location | string | No | US | Dataset location |
buffer_size | int | No | 50 | Buffer size in MB |
buffer_flush_timeout | int | No | 600 | Max buffer age in seconds |
BigQuery Streaming V2 (Recommended)
Section titled “BigQuery Streaming V2 (Recommended)”Uses the Storage Write API for high-throughput streaming. Best for production real-time pipelines.
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: my_dataset dataset_location: US buffer_size: 100 buffer_flush_timeout: 300 bq_max_rows_per_request: 5000Configuration
Section titled “Configuration”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
project_id | string | Yes | - | GCP project ID |
dataset_id | string | Yes | - | BigQuery dataset ID |
dataset_location | string | No | US | Dataset location |
buffer_size | int | No | 50 | Buffer size in MB |
buffer_flush_timeout | int | No | 600 | Max buffer age in seconds |
bq_max_rows_per_request | int | No | 5000 | Max rows per API call (max 10000) |
time_partitioning.type | enum | No | DAY | Partition type |
time_partitioning.field | string | No | _bizon_loaded_at | Partition field |
Authentication
Section titled “Authentication”Service Account Key
Section titled “Service Account Key”Provide a service account JSON key:
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics authentication: service_account_key: | { "type": "service_account", "project_id": "my-project", ... }Or reference from environment:
authentication: service_account_key: BIZON_ENV_GCP_SERVICE_ACCOUNTApplication Default Credentials
Section titled “Application Default Credentials”If running on GCP (GKE, Cloud Run, etc.) or with gcloud auth, omit authentication:
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics # No authentication section - uses ADCSchema Configuration
Section titled “Schema Configuration”Define schemas for structured loading with unnest: true:
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics unnest: true record_schemas: - destination_id: my-project.analytics.events record_schema: - name: event_id type: STRING mode: REQUIRED - name: event_type type: STRING mode: REQUIRED - name: user_id type: STRING mode: NULLABLE - name: payload type: JSON mode: NULLABLE - name: created_at type: TIMESTAMP mode: REQUIRED clustering_keys: - event_type - user_idColumn Types
Section titled “Column Types”| Type | Description |
|---|---|
STRING | Text |
INTEGER / INT64 | 64-bit integer |
FLOAT / FLOAT64 | 64-bit float |
BOOLEAN | True/false |
TIMESTAMP | Date and time |
DATE | Date only |
DATETIME | Date and time (no timezone) |
TIME | Time only |
JSON | JSON object |
BYTES | Binary data |
NUMERIC / BIGNUMERIC | Exact numeric |
GEOGRAPHY | Geographic data |
Column Modes
Section titled “Column Modes”| Mode | Description |
|---|---|
NULLABLE | Can be null (default) |
REQUIRED | Cannot be null |
REPEATED | Array of values |
Multi-Table Routing
Section titled “Multi-Table Routing”Route records to different tables based on destination_id:
source: name: kafka stream: topic topics: - name: user-events destination_id: my-project.analytics.user_events - name: order-events destination_id: my-project.analytics.order_events
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics unnest: true record_schemas: - destination_id: my-project.analytics.user_events record_schema: - name: user_id type: STRING mode: REQUIRED - name: action type: STRING mode: REQUIRED
- destination_id: my-project.analytics.order_events record_schema: - name: order_id type: STRING mode: REQUIRED - name: amount type: FLOAT64 mode: REQUIREDTime Partitioning
Section titled “Time Partitioning”Partition tables by time for better query performance:
destination: name: bigquery config: project_id: my-project dataset_id: analytics gcs_buffer_bucket: my-bucket time_partitioning: DAY # DAY, HOUR, MONTH, YEARdestination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics time_partitioning: type: DAY field: _bizon_loaded_at # Or your custom timestamp fieldClustering
Section titled “Clustering”Add clustering keys for better query performance:
record_schemas: - destination_id: my-project.analytics.events record_schema: - name: event_type type: STRING mode: REQUIRED - name: user_id type: STRING mode: NULLABLE clustering_keys: - event_type - user_idChoosing the Right Variant
Section titled “Choosing the Right Variant”┌─────────────────────────────────────────────────────────────┐│ Is latency critical? ││ (< 1 minute required) │└─────────────────────────────┬───────────────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ Yes No │ │ ▼ ▼┌─────────────────────────────┐ ┌──────────────────────────┐│ Is throughput high? │ │ bigquery ││ (> 1000 records/sec) │ │ (GCS Load Jobs) │└──────────────┬──────────────┘ └──────────────────────────┘ │ ┌───────────┴───────────┐ │ │ Yes No │ │ ▼ ▼┌─────────────────┐ ┌─────────────────────┐│ bigquery_ │ │ bigquery_streaming ││ streaming_v2 │ │ (Legacy API) │└─────────────────┘ └─────────────────────┘Production Configuration
Section titled “Production Configuration”Complete production example:
name: kafka-to-bigquery
source: name: kafka stream: topic sync_mode: stream bootstrap_servers: kafka:9092 topics: - name: events destination_id: my-project.analytics.events
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics dataset_location: US buffer_size: 100 buffer_flush_timeout: 300 bq_max_rows_per_request: 5000 unnest: true time_partitioning: type: DAY field: _bizon_loaded_at record_schemas: - destination_id: my-project.analytics.events record_schema: - name: event_id type: STRING mode: REQUIRED - name: event_type type: STRING mode: REQUIRED - name: payload type: JSON mode: NULLABLE - name: timestamp type: TIMESTAMP mode: REQUIRED clustering_keys: - event_type authentication: service_account_key: BIZON_ENV_GCP_SERVICE_ACCOUNT
engine: backend: type: postgres config: host: db.example.com database: bizon schema: public
runner: type: stream log_level: INFONext Steps
Section titled “Next Steps”- Kafka to BigQuery Tutorial - Complete streaming guide
- Kafka Source - Configure Kafka consumption
- Engine Configuration - Production backend setup