Pipeline Management
Pipelines are the core unit of work in Bizon Platform. Each pipeline defines a data flow from a source to a destination.
Creating a Pipeline
Section titled “Creating a Pipeline”-
Navigate to Pipelines
Click “Pipelines” in the sidebar, then “Create Pipeline”.
-
Configure Source
Select a source connector and configure authentication:
- Choose from available sources (HubSpot, Kafka, etc.)
- Select the stream to sync (contacts, orders, etc.)
- Enter authentication credentials
-
Configure Destination
Select where data should be written:
- Choose destination (BigQuery, Logger, etc.)
- Configure connection settings
- Set buffer options for performance
-
Set Schedule (Optional)
Define when the pipeline runs:
- Cron expression for recurring runs
- Leave empty for manual-only execution
-
Review and Create
Review the configuration and create the pipeline.
Pipeline Configuration
Section titled “Pipeline Configuration”A pipeline configuration consists of:
name: "hubspot-contacts-to-bigquery"source: name: hubspot stream: contacts authentication: type: api_key params: token: "pat-xxx"
destination: name: bigquery config: project_id: "my-project" dataset: "raw_data" buffer_size: 100 buffer_flush_timeout: 300Source Options
Section titled “Source Options”| Field | Required | Description |
|---|---|---|
name | Yes | Source connector name |
stream | Yes | Data stream to sync |
authentication | Yes | Auth configuration |
Destination Options
Section titled “Destination Options”| Field | Required | Description |
|---|---|---|
name | Yes | Destination connector name |
config | Yes | Destination-specific settings |
buffer_size | No | Buffer size in MB (default: 50) |
buffer_flush_timeout | No | Max seconds before flush (default: 600) |
max_concurrent_threads | No | Parallel write threads (default: 10) |
Running Pipelines
Section titled “Running Pipelines”Manual Execution
Section titled “Manual Execution”Click “Run” on any pipeline to trigger immediate execution.
Scheduled Execution
Section titled “Scheduled Execution”Set a cron expression for automatic runs:
| Expression | Schedule |
|---|---|
0 * * * * | Every hour |
0 */6 * * * | Every 6 hours |
0 0 * * * | Daily at midnight |
0 0 * * 0 | Weekly on Sunday |
0 0 1 * * | Monthly on the 1st |
Run Status
Section titled “Run Status”| Status | Description |
|---|---|
pending | Queued, waiting for worker |
running | Currently executing |
success | Completed successfully |
failed | Failed with error |
cancelled | Manually cancelled |
Monitoring Runs
Section titled “Monitoring Runs”Run History
Section titled “Run History”View all runs for a pipeline with:
- Status and duration
- Records processed
- Error messages (if failed)
Access detailed logs for debugging:
- Step-by-step execution trace
- Record counts per batch
- Error stack traces
Output Files
Section titled “Output Files”For file-based destinations, download output files directly from the UI.
Transforms
Section titled “Transforms”Apply Python transformations to records:
transforms: - label: "Normalize email" python: | record['email'] = record.get('email', '').lower() return record
- label: "Add timestamp" python: | from datetime import datetime record['synced_at'] = datetime.utcnow().isoformat() return recordTransform Rules
Section titled “Transform Rules”- Each transform receives a
recorddict - Must return the modified record
- Has access to standard library (datetime, json, re)
- Dangerous imports are blocked for security
Advanced Settings
Section titled “Advanced Settings”Engine Configuration
Section titled “Engine Configuration”Control checkpoint behavior:
engine: syncCursorInDBEvery: 50 # Lower = more durable, slowerDomain Assignment
Section titled “Domain Assignment”Organize pipelines by team:
domain_id: "marketing-team-uuid"Best Practices
Section titled “Best Practices”- Use saved connectors - Store credentials once, reuse across pipelines
- Set appropriate buffers - Balance memory vs. write frequency
- Add transforms carefully - Keep them simple and fast
- Monitor run history - Check for failures regularly
- Use domains - Organize by team for easier management