Architecture
Bizon is designed as a modular, queue-agnostic ELT framework that prioritizes reliability and observability.
High-Level Overview
Section titled “High-Level Overview”┌─────────────┐ ┌─────────────┐ ┌─────────────────┐│ Source │────▶│ Queue │────▶│ Destination ││ (Extract) │ │ (Buffer) │ │ (Load) │└─────────────┘ └─────────────┘ └─────────────────┘ │ │ │ └───────────────────┴─────────────────────┘ │ ┌──────┴──────┐ │ Backend │ │ (State) │ └─────────────┘Core Components
Section titled “Core Components”Sources
Section titled “Sources”Sources are responsible for extracting data from external systems. Bizon provides a clean interface for implementing custom sources, with built-in support for pagination, rate limiting, and error handling.
The queue acts as a buffer between extraction and loading, enabling:
- Decoupling: Sources and destinations operate independently
- Backpressure handling: Prevent overwhelming destinations
- Fault tolerance: Data is preserved if the destination fails
Supported queues:
- Python Queue (for development/testing)
- RabbitMQ (production, high throughput)
- Kafka / Redpanda (production, with persistence)
Destinations
Section titled “Destinations”Destinations handle loading data into target systems. Bizon uses efficient batch loading with:
- Memory-efficient buffering via Polars
- Parquet support via PyArrow
- Configurable batch sizes
Backend
Section titled “Backend”The backend stores pipeline state, including checkpoints for recovery. Supported backends:
- SQLite (in-memory, for development)
- PostgreSQL (production, frequent updates)
- BigQuery (production, lightweight setup)
Runner Modes
Section titled “Runner Modes”Bizon supports multiple execution modes:
| Mode | Description | Use Case |
|---|---|---|
| Thread | Async execution with threads | General purpose |
| Process | Async execution with processes | CPU-bound work |
| Stream | Synchronous streaming | Low latency, real-time pipelines |
Stream Mode
Section titled “Stream Mode”The Stream runner is designed for real-time, low-latency pipelines. Unlike Thread and Process modes, Stream mode:
- Bypasses the queue buffer — Data flows directly from source to destination without intermediate buffering, reducing latency and memory overhead
- Offset committing — Supports stream checkpointing where offsets are committed after successful writes, enabling exactly-once semantics for sources like Kafka
- Synchronous execution — Processes records in a tight loop for predictable performance
┌─────────────┐ ┌─────────────────┐│ Source │─────────────────────▶│ Destination ││ (Extract) │ (direct flow) │ (Load) │└─────────────┘ └─────────────────┘ │ │ └──────────── Offset Commit ──────────┘ │ ┌──────┴──────┐ │ Backend │ │ (State) │ └─────────────┘Use Stream mode when:
- You need sub-second latency (e.g., Kafka → BigQuery streaming)
- Your source supports offset-based checkpointing
- You want to minimize memory usage by avoiding buffering
Next Steps
Section titled “Next Steps”- Learn about Checkpointing for fault tolerance
- Explore Queue options for your infrastructure