Sync Modes
Bizon supports three sync modes that determine how data is extracted from sources. Choose the right mode based on your data freshness requirements and source capabilities.
Overview
Section titled “Overview”| Mode | Creates New Job | Resumes From | Best For |
|---|---|---|---|
full_refresh | Yes, every run | Beginning | Small datasets, complete snapshots |
incremental | Yes, every run | Last sync timestamp | Large datasets, append-only data |
stream | No, reuses job | Last committed offset | Real-time data, Kafka/queues |
Full Refresh
Section titled “Full Refresh”Full refresh mode syncs all data from the source on every run.
source: name: hubspot stream: contacts sync_mode: full_refreshHow it works:
- Creates a new StreamJob each run
- Extracts all records from the source
- Continues until pagination is exhausted
- Marks job as SUCCEEDED or FAILED
Use cases:
- Small to medium datasets where full extraction is fast
- Data that changes unpredictably (no reliable timestamp)
- Initial loads before switching to incremental
- Sources that don’t support incremental queries
Considerations:
- Higher data transfer costs for large datasets
- Longer sync times as data grows
- Destination must handle duplicate/updated records
Incremental
Section titled “Incremental”Incremental mode only syncs new or updated records since the last successful sync.
source: name: hubspot stream: contacts sync_mode: incrementalHow it works:
- Gets the
start_datefrom the last successful StreamJob - Creates a new StreamJob
- Queries source for records updated after
start_date - Syncs until pagination is exhausted
- Stores the new sync timestamp for next run
Use cases:
- Large datasets where full refresh is too slow
- Append-only data (logs, events, transactions)
- APIs with efficient date-filtered queries
- Cost optimization for metered APIs
Requirements:
- Source must implement
get_records_after()method - Source API must support filtering by modification date
- Check stream support with
bizon stream list <source>
Configuration:
source: name: hubspot stream: contacts sync_mode: incremental # Optional: force restart from scratch force_ignore_checkpoint: falseConsiderations:
- Relies on source’s timestamp accuracy
- Deleted records may not be captured
- First run behaves like full refresh
Stream
Section titled “Stream”Stream mode enables continuous, real-time data ingestion with offset tracking.
source: name: kafka stream: topic sync_mode: streamHow it works:
- Uses a single, persistent RUNNING StreamJob
- Tracks offsets/positions in the source system
- Commits progress periodically to backend
- Runs continuously until stopped
- Resumes from last committed offset on restart
Use cases:
- Real-time data pipelines
- Kafka/Redpanda topic consumption
- Event streaming architectures
- Low-latency data delivery
Configuration:
source: name: kafka stream: topic sync_mode: stream bootstrap_servers: kafka:9092 group_id: my-consumer-group topics: - name: events destination_id: project.dataset.events
engine: runner: type: stream # Use stream runner for continuous operationConsiderations:
- Requires stream-capable source (Kafka, RabbitMQ)
- Use
--runner streamfor continuous operation - Backend must be persistent (postgres, bigquery) for production
Choosing the Right Mode
Section titled “Choosing the Right Mode”┌─────────────────────────────────────────────────────────────┐│ Is your data real-time? ││ (Kafka, message queue) │└─────────────────────────────┬───────────────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ Yes No │ │ ▼ ▼ ┌────────┐ ┌───────────────────────────────┐ │ STREAM │ │ Does source support │ └────────┘ │ incremental queries? │ └───────────────┬───────────────┘ │ ┌───────────────┴───────────────┐ │ │ Yes No │ │ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ │ Is dataset │ │ FULL_REFRESH │ │ large (>1M)? │ └──────────────────┘ └────────┬────────┘ │ ┌───────────────┴───────────────┐ │ │ Yes No │ │ ▼ ▼ ┌─────────────┐ ┌──────────────────┐ │ INCREMENTAL │ │ FULL_REFRESH │ └─────────────┘ └──────────────────┘Force Checkpoint Reset
Section titled “Force Checkpoint Reset”To ignore existing checkpoints and start fresh:
source: name: hubspot stream: contacts sync_mode: incremental force_ignore_checkpoint: true # Start from scratchThis is useful when:
- Schema changes require re-syncing all data
- Source data was corrected/backfilled
- Testing or debugging pipelines
Next Steps
Section titled “Next Steps”- Sources Overview - Learn about source connectors
- Checkpointing - Understand fault tolerance
- Engine Configuration - Configure backends for checkpoint storage