Checkpointing

Checkpointing is at the heart of Bizon’s reliability. It enables pipelines to recover from failures without reprocessing all data from the beginning.

How It Works

Bizon tracks progress at the record level using a checkpointing mechanism:

During extraction: The source reports its position (cursor) after each batch
After loading: Checkpoints are saved to the backend
On failure: The pipeline resumes from the last saved checkpoint

┌─────────────┐
│   Source    │
│             │──────▶ Cursor position saved
└─────────────┘
       │
       ▼
┌─────────────┐
│   Backend   │◀────── Checkpoint persisted
│             │
└─────────────┘
       │
       ▼
   On restart:
   Resume from last checkpoint

Benefits

No Full Restarts

When a pipeline fails (network issues, destination downtime, etc.), Bizon doesn’t need to re-extract everything. It picks up exactly where it left off.

Predictable Long-Running Loads

For pipelines processing billions of records, checkpointing ensures:

Consistent progress tracking
Reduced manual intervention
Faster recovery times

Less Debugging Guesswork

With checkpoints, you always know:

What data has been successfully loaded
Where the pipeline stopped
What needs to be reprocessed (if anything)

Configuration

Checkpointing is automatic, but you can configure the checkpoint interval:

backend:
  type: postgres
  config:
    connection_string: postgresql://...
    checkpoint_interval: 1000  # Checkpoint every 1000 records

Backend Options

Backend	Best For	Checkpoint Frequency
SQLite	Development, testing	In-memory only
PostgreSQL	Production, frequent updates	High
BigQuery	Production, lightweight setup	Medium

Next Steps

Configure Queues for your pipeline
See the Configuration Reference