Skip to content

Architecture

Bizon is designed as a modular, queue-agnostic ELT framework that prioritizes reliability and observability.

┌─────────────┐ ┌─────────────┐ ┌─────────────────┐
│ Source │────▶│ Queue │────▶│ Destination │
│ (Extract) │ │ (Buffer) │ │ (Load) │
└─────────────┘ └─────────────┘ └─────────────────┘
│ │ │
└───────────────────┴─────────────────────┘
┌──────┴──────┐
│ Backend │
│ (State) │
└─────────────┘

Sources are responsible for extracting data from external systems. Bizon provides a clean interface for implementing custom sources, with built-in support for pagination, rate limiting, and error handling.

The queue acts as a buffer between extraction and loading, enabling:

  • Decoupling: Sources and destinations operate independently
  • Backpressure handling: Prevent overwhelming destinations
  • Fault tolerance: Data is preserved if the destination fails

Supported queues:

  • Python Queue (for development/testing)
  • RabbitMQ (production, high throughput)
  • Kafka / Redpanda (production, with persistence)

Destinations handle loading data into target systems. Bizon uses efficient batch loading with:

  • Memory-efficient buffering via Polars
  • Parquet support via PyArrow
  • Configurable batch sizes

The backend stores pipeline state, including checkpoints for recovery. Supported backends:

  • SQLite (in-memory, for development)
  • PostgreSQL (production, frequent updates)
  • BigQuery (production, lightweight setup)

Bizon supports multiple execution modes:

ModeDescriptionUse Case
ThreadAsync execution with threadsGeneral purpose
ProcessAsync execution with processesCPU-bound work
StreamSynchronous streamingLow latency, real-time pipelines

The Stream runner is designed for real-time, low-latency pipelines. Unlike Thread and Process modes, Stream mode:

  • Bypasses the queue buffer — Data flows directly from source to destination without intermediate buffering, reducing latency and memory overhead
  • Offset committing — Supports stream checkpointing where offsets are committed after successful writes, enabling exactly-once semantics for sources like Kafka
  • Synchronous execution — Processes records in a tight loop for predictable performance
┌─────────────┐ ┌─────────────────┐
│ Source │─────────────────────▶│ Destination │
│ (Extract) │ (direct flow) │ (Load) │
└─────────────┘ └─────────────────┘
│ │
└──────────── Offset Commit ──────────┘
┌──────┴──────┐
│ Backend │
│ (State) │
└─────────────┘

Use Stream mode when:

  • You need sub-second latency (e.g., Kafka → BigQuery streaming)
  • Your source supports offset-based checkpointing
  • You want to minimize memory usage by avoiding buffering