Checkpointing
Checkpointing is at the heart of Bizon’s reliability. It enables pipelines to recover from failures without reprocessing all data from the beginning.
How It Works
Section titled “How It Works”Bizon tracks progress at the record level using a checkpointing mechanism:
- During extraction: The source reports its position (cursor) after each batch
- After loading: Checkpoints are saved to the backend
- On failure: The pipeline resumes from the last saved checkpoint
┌─────────────┐│ Source ││ │──────▶ Cursor position saved└─────────────┘ │ ▼┌─────────────┐│ Backend │◀────── Checkpoint persisted│ │└─────────────┘ │ ▼ On restart: Resume from last checkpointBenefits
Section titled “Benefits”No Full Restarts
Section titled “No Full Restarts”When a pipeline fails (network issues, destination downtime, etc.), Bizon doesn’t need to re-extract everything. It picks up exactly where it left off.
Predictable Long-Running Loads
Section titled “Predictable Long-Running Loads”For pipelines processing billions of records, checkpointing ensures:
- Consistent progress tracking
- Reduced manual intervention
- Faster recovery times
Less Debugging Guesswork
Section titled “Less Debugging Guesswork”With checkpoints, you always know:
- What data has been successfully loaded
- Where the pipeline stopped
- What needs to be reprocessed (if anything)
Configuration
Section titled “Configuration”Checkpointing is automatic, but you can configure the checkpoint interval:
backend: type: postgres config: connection_string: postgresql://... checkpoint_interval: 1000 # Checkpoint every 1000 recordsBackend Options
Section titled “Backend Options”| Backend | Best For | Checkpoint Frequency |
|---|---|---|
| SQLite | Development, testing | In-memory only |
| PostgreSQL | Production, frequent updates | High |
| BigQuery | Production, lightweight setup | Medium |
Next Steps
Section titled “Next Steps”- Configure Queues for your pipeline
- See the Configuration Reference