Pipeline Management

Pipelines are the core unit of work in Bizon Platform. Each pipeline defines a data flow from a source to a destination.

Creating a Pipeline

Navigate to Pipelines

Click “Pipelines” in the sidebar, then “Create Pipeline”.
Configure Source

Select a source connector and configure authentication:
- Choose from available sources (HubSpot, Kafka, etc.)
- Select the stream to sync (contacts, orders, etc.)
- Enter authentication credentials
Configure Destination

Select where data should be written:
- Choose destination (BigQuery, Logger, etc.)
- Configure connection settings
- Set buffer options for performance
Set Schedule (Optional)

Define when the pipeline runs:
- Cron expression for recurring runs
- Leave empty for manual-only execution
Review and Create

Review the configuration and create the pipeline.

Pipeline Configuration

A pipeline configuration consists of:

name: "hubspot-contacts-to-bigquery"
source:
  name: hubspot
  stream: contacts
  authentication:
    type: api_key
    params:
      token: "pat-xxx"

destination:
  name: bigquery
  config:
    project_id: "my-project"
    dataset: "raw_data"
    buffer_size: 100
    buffer_flush_timeout: 300

Source Options

Field	Required	Description
`name`	Yes	Source connector name
`stream`	Yes	Data stream to sync
`authentication`	Yes	Auth configuration

Destination Options

Field	Required	Description
`name`	Yes	Destination connector name
`config`	Yes	Destination-specific settings
`buffer_size`	No	Buffer size in MB (default: 50)
`buffer_flush_timeout`	No	Max seconds before flush (default: 600)
`max_concurrent_threads`	No	Parallel write threads (default: 10)

Running Pipelines

Manual Execution

Click “Run” on any pipeline to trigger immediate execution.

Scheduled Execution

Set a cron expression for automatic runs:

Expression	Schedule
`0 * * * *`	Every hour
`0 /6 * *`	Every 6 hours
`0 0 * * *`	Daily at midnight
`0 0 * * 0`	Weekly on Sunday
`0 0 1 * *`	Monthly on the 1st

Run Status

Status	Description
`pending`	Queued, waiting for worker
`running`	Currently executing
`success`	Completed successfully
`failed`	Failed with error
`cancelled`	Manually cancelled

Monitoring Runs

Run History

View all runs for a pipeline with:

Status and duration
Records processed
Error messages (if failed)

Logs

Access detailed logs for debugging:

Step-by-step execution trace
Record counts per batch
Error stack traces

Output Files

For file-based destinations, download output files directly from the UI.

Transforms

Apply Python transformations to records:

transforms:
  - label: "Normalize email"
    python: |
      record['email'] = record.get('email', '').lower()
      return record

  - label: "Add timestamp"
    python: |
      from datetime import datetime
      record['synced_at'] = datetime.utcnow().isoformat()
      return record

Transform Rules

Each transform receives a record dict
Must return the modified record
Has access to standard library (datetime, json, re)
Dangerous imports are blocked for security

Advanced Settings

Engine Configuration

Control checkpoint behavior:

engine:
  syncCursorInDBEvery: 50  # Lower = more durable, slower

Domain Assignment

Organize pipelines by team:

domain_id: "marketing-team-uuid"

Best Practices

Use saved connectors - Store credentials once, reuse across pipelines
Set appropriate buffers - Balance memory vs. write frequency
Add transforms carefully - Keep them simple and fast
Monitor run history - Check for failures regularly
Use domains - Organize by team for easier management