Engine Configuration
The engine configuration controls how Bizon stores state, passes messages between pipeline stages, and executes workloads.
Overview
Section titled “Overview”engine: backend: type: postgres config: host: localhost port: 5432 database: bizon schema: public
queue: type: python_queue config: max_nb_messages: 1000000
runner: type: thread config: max_workers: 2 log_level: INFOBackends
Section titled “Backends”Backends store pipeline state: job status, checkpoints, and cursor positions for recovery.
Backend Types
Section titled “Backend Types”| Type | Use Case | Persistence |
|---|---|---|
sqlite | Development | File-based |
sqlite_in_memory | Testing | None |
postgres | Production | Full |
bigquery | Serverless production | Full |
SQLite (Development)
Section titled “SQLite (Development)”Stores state in a local bizon.db file:
engine: backend: type: sqlite config: database: bizon schema: publicSQLite In-Memory (Testing)
Section titled “SQLite In-Memory (Testing)”No persistence, useful for unit tests:
engine: backend: type: sqlite_in_memory config: database: bizon schema: publicPostgreSQL (Production)
Section titled “PostgreSQL (Production)”Production-ready with full durability:
engine: backend: type: postgres config: host: localhost port: 5432 database: bizon schema: public username: bizon_user password: BIZON_ENV_POSTGRES_PASSWORDBigQuery (Serverless)
Section titled “BigQuery (Serverless)”Stores state in BigQuery tables:
engine: backend: type: bigquery config: project_id: my-project dataset_id: bizon_state database: bizon schema: publicBackend Configuration
Section titled “Backend Configuration”| Field | Type | Default | Description |
|---|---|---|---|
database | string | Required | Database name |
schema | string | Required | Schema name |
syncCursorInDBEvery | int | 10 | Sync cursor to DB every N iterations |
Queues
Section titled “Queues”Queues handle async message passing between producer (source) and consumer (destination) stages.
Queue Types
Section titled “Queue Types”| Type | Use Case | Throughput |
|---|---|---|
python_queue | Development | Low-Medium |
kafka | Production streaming | Very High |
rabbitmq | Production reliability | High |
Python Queue (Development)
Section titled “Python Queue (Development)”In-memory queue, no external dependencies:
engine: queue: type: python_queue config: max_nb_messages: 1000000Kafka (Production)
Section titled “Kafka (Production)”High-throughput streaming workloads:
engine: queue: type: kafka config: bootstrap_servers: kafka:9092 max_nb_messages: 10000000RabbitMQ (Production)
Section titled “RabbitMQ (Production)”Reliable message delivery:
engine: queue: type: rabbitmq config: host: rabbitmq port: 5672 username: guest password: guest max_nb_messages: 1000000Queue Configuration
Section titled “Queue Configuration”| Field | Type | Default | Description |
|---|---|---|---|
max_nb_messages | int | 1000000 | Maximum queue size |
Runners
Section titled “Runners”Runners control how the pipeline executes: concurrency model and logging.
Runner Types
Section titled “Runner Types”| Type | Concurrency | Use Case |
|---|---|---|
thread | Async I/O | Default, most workloads |
process | True parallelism | CPU-intensive transforms |
stream | Single-threaded | Low-latency streaming |
Thread Runner (Default)
Section titled “Thread Runner (Default)”Uses ThreadPoolExecutor for concurrent I/O:
engine: runner: type: thread config: max_workers: 2 consumer_start_delay: 2 is_alive_check_interval: 2 log_level: INFOProcess Runner
Section titled “Process Runner”Uses ProcessPoolExecutor for CPU-bound work:
engine: runner: type: process config: max_workers: 4 log_level: INFOStream Runner
Section titled “Stream Runner”Single-threaded synchronous execution for lowest latency:
engine: runner: type: stream log_level: INFOUse with --runner stream CLI flag:
bizon run config.yml --runner streamRunner Configuration
Section titled “Runner Configuration”| Field | Type | Default | Description |
|---|---|---|---|
max_workers | int | 2 | Number of worker threads/processes |
consumer_start_delay | int | 2 | Seconds to wait before starting consumer |
is_alive_check_interval | int | 2 | Seconds between liveness checks |
log_level | enum | INFO | Logging level (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL) |
Production Configuration
Section titled “Production Configuration”Recommended production setup:
name: production-pipeline
source: name: kafka stream: topic sync_mode: stream
destination: name: bigquery_streaming_v2 config: project_id: my-project dataset_id: analytics buffer_size: 100 buffer_flush_timeout: 300
engine: backend: type: postgres config: host: db.example.com port: 5432 database: bizon schema: public username: bizon password: BIZON_ENV_DB_PASSWORD syncCursorInDBEvery: 5
queue: type: kafka config: bootstrap_servers: kafka.example.com:9092 max_nb_messages: 10000000
runner: type: stream log_level: INFODevelopment Configuration
Section titled “Development Configuration”Simple local setup:
name: dev-pipeline
source: name: dummy stream: creatures
destination: name: logger config: {}
# Defaults applied:# engine:# backend:# type: sqlite# queue:# type: python_queue# runner:# type: threadNext Steps
Section titled “Next Steps”- Configuration Reference - Complete YAML options
- Checkpointing - Understand fault tolerance
- Queues - Queue system deep dive