Engine Configuration

The engine configuration controls how Bizon stores state, passes messages between pipeline stages, and executes workloads.

Overview

engine:
  backend:
    type: postgres
    config:
      host: localhost
      port: 5432
      database: bizon
      schema: public

  queue:
    type: python_queue
    config:
      max_nb_messages: 1000000

  runner:
    type: thread
    config:
      max_workers: 2
    log_level: INFO

Backends

Backends store pipeline state: job status, checkpoints, and cursor positions for recovery.

Backend Types

Type	Use Case	Persistence
`sqlite`	Development	File-based
`sqlite_in_memory`	Testing	None
`postgres`	Production	Full
`bigquery`	Serverless production	Full

SQLite (Development)

Stores state in a local bizon.db file:

engine:
  backend:
    type: sqlite
    config:
      database: bizon
      schema: public

SQLite In-Memory (Testing)

No persistence, useful for unit tests:

engine:
  backend:
    type: sqlite_in_memory
    config:
      database: bizon
      schema: public

PostgreSQL (Production)

Production-ready with full durability:

engine:
  backend:
    type: postgres
    config:
      host: localhost
      port: 5432
      database: bizon
      schema: public
      username: bizon_user
      password: BIZON_ENV_POSTGRES_PASSWORD

BigQuery (Serverless)

Stores state in BigQuery tables:

engine:
  backend:
    type: bigquery
    config:
      project_id: my-project
      dataset_id: bizon_state
      database: bizon
      schema: public

Backend Configuration

Field	Type	Default	Description
`database`	string	Required	Database name
`schema`	string	Required	Schema name
`syncCursorInDBEvery`	int	`10`	Sync cursor to DB every N iterations

Queues

Queues handle async message passing between producer (source) and consumer (destination) stages.

Queue Types

Type	Use Case	Throughput
`python_queue`	Development	Low-Medium
`kafka`	Production streaming	Very High
`rabbitmq`	Production reliability	High

Python Queue (Development)

In-memory queue, no external dependencies:

engine:
  queue:
    type: python_queue
    config:
      max_nb_messages: 1000000

Kafka (Production)

High-throughput streaming workloads:

engine:
  queue:
    type: kafka
    config:
      bootstrap_servers: kafka:9092
      max_nb_messages: 10000000

RabbitMQ (Production)

Reliable message delivery:

engine:
  queue:
    type: rabbitmq
    config:
      host: rabbitmq
      port: 5672
      username: guest
      password: guest
      max_nb_messages: 1000000

Queue Configuration

Field	Type	Default	Description
`max_nb_messages`	int	`1000000`	Maximum queue size

Runners

Runners control how the pipeline executes: concurrency model and logging.

Runner Types

Type	Concurrency	Use Case
`thread`	Async I/O	Default, most workloads
`process`	True parallelism	CPU-intensive transforms
`stream`	Single-threaded	Low-latency streaming

Thread Runner (Default)

Uses ThreadPoolExecutor for concurrent I/O:

engine:
  runner:
    type: thread
    config:
      max_workers: 2
      consumer_start_delay: 2
      is_alive_check_interval: 2
    log_level: INFO

Process Runner

Uses ProcessPoolExecutor for CPU-bound work:

engine:
  runner:
    type: process
    config:
      max_workers: 4
    log_level: INFO

Stream Runner

Single-threaded synchronous execution for lowest latency:

engine:
  runner:
    type: stream
    log_level: INFO

Use with --runner stream CLI flag:

bizon run config.yml --runner stream

Runner Configuration

Field	Type	Default	Description
`max_workers`	int	`2`	Number of worker threads/processes
`consumer_start_delay`	int	`2`	Seconds to wait before starting consumer
`is_alive_check_interval`	int	`2`	Seconds between liveness checks
`log_level`	enum	`INFO`	Logging level (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)

Production Configuration

Recommended production setup:

name: production-pipeline

source:
  name: kafka
  stream: topic
  sync_mode: stream

destination:
  name: bigquery_streaming_v2
  config:
    project_id: my-project
    dataset_id: analytics
    buffer_size: 100
    buffer_flush_timeout: 300

engine:
  backend:
    type: postgres
    config:
      host: db.example.com
      port: 5432
      database: bizon
      schema: public
      username: bizon
      password: BIZON_ENV_DB_PASSWORD
      syncCursorInDBEvery: 5

  queue:
    type: kafka
    config:
      bootstrap_servers: kafka.example.com:9092
      max_nb_messages: 10000000

  runner:
    type: stream
    log_level: INFO

Development Configuration

Simple local setup:

name: dev-pipeline

source:
  name: dummy
  stream: creatures

destination:
  name: logger
  config: {}

# Defaults applied:
# engine:
#   backend:
#     type: sqlite
#   queue:
#     type: python_queue
#   runner:
#     type: thread

Next Steps

Configuration Reference - Complete YAML options
Checkpointing - Understand fault tolerance
Queues - Queue system deep dive