Skip to content

Engine Configuration

The engine configuration controls how Bizon stores state, passes messages between pipeline stages, and executes workloads.

engine:
backend:
type: postgres
config:
host: localhost
port: 5432
database: bizon
schema: public
queue:
type: python_queue
config:
max_nb_messages: 1000000
runner:
type: thread
config:
max_workers: 2
log_level: INFO

Backends store pipeline state: job status, checkpoints, and cursor positions for recovery.

TypeUse CasePersistence
sqliteDevelopmentFile-based
sqlite_in_memoryTestingNone
postgresProductionFull
bigqueryServerless productionFull

Stores state in a local bizon.db file:

engine:
backend:
type: sqlite
config:
database: bizon
schema: public

No persistence, useful for unit tests:

engine:
backend:
type: sqlite_in_memory
config:
database: bizon
schema: public

Production-ready with full durability:

engine:
backend:
type: postgres
config:
host: localhost
port: 5432
database: bizon
schema: public
username: bizon_user
password: BIZON_ENV_POSTGRES_PASSWORD

Stores state in BigQuery tables:

engine:
backend:
type: bigquery
config:
project_id: my-project
dataset_id: bizon_state
database: bizon
schema: public
FieldTypeDefaultDescription
databasestringRequiredDatabase name
schemastringRequiredSchema name
syncCursorInDBEveryint10Sync cursor to DB every N iterations

Queues handle async message passing between producer (source) and consumer (destination) stages.

TypeUse CaseThroughput
python_queueDevelopmentLow-Medium
kafkaProduction streamingVery High
rabbitmqProduction reliabilityHigh

In-memory queue, no external dependencies:

engine:
queue:
type: python_queue
config:
max_nb_messages: 1000000

High-throughput streaming workloads:

engine:
queue:
type: kafka
config:
bootstrap_servers: kafka:9092
max_nb_messages: 10000000

Reliable message delivery:

engine:
queue:
type: rabbitmq
config:
host: rabbitmq
port: 5672
username: guest
password: guest
max_nb_messages: 1000000
FieldTypeDefaultDescription
max_nb_messagesint1000000Maximum queue size

Runners control how the pipeline executes: concurrency model and logging.

TypeConcurrencyUse Case
threadAsync I/ODefault, most workloads
processTrue parallelismCPU-intensive transforms
streamSingle-threadedLow-latency streaming

Uses ThreadPoolExecutor for concurrent I/O:

engine:
runner:
type: thread
config:
max_workers: 2
consumer_start_delay: 2
is_alive_check_interval: 2
log_level: INFO

Uses ProcessPoolExecutor for CPU-bound work:

engine:
runner:
type: process
config:
max_workers: 4
log_level: INFO

Single-threaded synchronous execution for lowest latency:

engine:
runner:
type: stream
log_level: INFO

Use with --runner stream CLI flag:

Terminal window
bizon run config.yml --runner stream
FieldTypeDefaultDescription
max_workersint2Number of worker threads/processes
consumer_start_delayint2Seconds to wait before starting consumer
is_alive_check_intervalint2Seconds between liveness checks
log_levelenumINFOLogging level (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)

Recommended production setup:

name: production-pipeline
source:
name: kafka
stream: topic
sync_mode: stream
destination:
name: bigquery_streaming_v2
config:
project_id: my-project
dataset_id: analytics
buffer_size: 100
buffer_flush_timeout: 300
engine:
backend:
type: postgres
config:
host: db.example.com
port: 5432
database: bizon
schema: public
username: bizon
password: BIZON_ENV_DB_PASSWORD
syncCursorInDBEvery: 5
queue:
type: kafka
config:
bootstrap_servers: kafka.example.com:9092
max_nb_messages: 10000000
runner:
type: stream
log_level: INFO

Simple local setup:

name: dev-pipeline
source:
name: dummy
stream: creatures
destination:
name: logger
config: {}
# Defaults applied:
# engine:
# backend:
# type: sqlite
# queue:
# type: python_queue
# runner:
# type: thread