Skip to content

Checkpointing

Checkpointing is at the heart of Bizon’s reliability. It enables pipelines to recover from failures without reprocessing all data from the beginning.

Bizon tracks progress at the record level using a checkpointing mechanism:

  1. During extraction: The source reports its position (cursor) after each batch
  2. After loading: Checkpoints are saved to the backend
  3. On failure: The pipeline resumes from the last saved checkpoint
┌─────────────┐
│ Source │
│ │──────▶ Cursor position saved
└─────────────┘
┌─────────────┐
│ Backend │◀────── Checkpoint persisted
│ │
└─────────────┘
On restart:
Resume from last checkpoint

When a pipeline fails (network issues, destination downtime, etc.), Bizon doesn’t need to re-extract everything. It picks up exactly where it left off.

For pipelines processing billions of records, checkpointing ensures:

  • Consistent progress tracking
  • Reduced manual intervention
  • Faster recovery times

With checkpoints, you always know:

  • What data has been successfully loaded
  • Where the pipeline stopped
  • What needs to be reprocessed (if anything)

Checkpointing is automatic, but you can configure the checkpoint interval:

backend:
type: postgres
config:
connection_string: postgresql://...
checkpoint_interval: 1000 # Checkpoint every 1000 records
BackendBest ForCheckpoint Frequency
SQLiteDevelopment, testingIn-memory only
PostgreSQLProduction, frequent updatesHigh
BigQueryProduction, lightweight setupMedium