Notion Source
The Notion source extracts databases, pages, blocks, and users from Notion workspaces via the Notion API. It supports incremental sync for efficiently fetching only updated content.
Installation
Section titled “Installation”pip install bizon[notion]Quick Start
Section titled “Quick Start”name: notion-pipeline
source: name: notion stream: all_pages sync_mode: full_refresh authentication: type: api_key params: token: BIZON_ENV_NOTION_TOKEN
destination: name: bigquery config: project_id: my-project dataset_id: notion_data gcs_buffer_bucket: my-bucketAvailable Streams
Section titled “Available Streams”Check available streams:
bizon stream list notion| Stream | Description | Incremental |
|---|---|---|
databases | Databases from configured database_ids | Yes |
data_sources | Data sources from configured databases | No |
pages | Pages from databases or specific page_ids | Yes |
blocks | All blocks recursively from pages/databases | Yes |
blocks_markdown | Blocks converted to markdown format | Yes |
users | Workspace users | No |
all_pages | All accessible pages via Search API | Yes |
all_databases | All accessible databases | Yes |
all_data_sources | All accessible data sources | No |
all_blocks_markdown | All blocks converted to markdown | Yes |
Configuration
Section titled “Configuration”Source Configuration
Section titled “Source Configuration”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
database_ids | list[str] | No | [] | Notion database IDs to fetch |
page_ids | list[str] | No | [] | Specific page IDs to fetch |
fetch_blocks_recursively | bool | No | true | Fetch nested blocks recursively |
max_recursion_depth | int | No | 5 | Max nesting depth (1-100) |
page_size | int | No | 100 | Results per page (max 100) |
max_workers | int | No | 3 | Concurrent workers (1-10) |
database_filters | dict | No | {} | Database ID to filter mapping |
Authentication
Section titled “Authentication”Notion uses API key (internal integration token) authentication:
source: name: notion stream: all_pages authentication: type: api_key params: token: BIZON_ENV_NOTION_TOKENTo get your token:
- Go to Notion Integrations
- Create a new internal integration
- Copy the integration token
- Share your databases/pages with the integration
Sync Modes
Section titled “Sync Modes”Full Refresh
Section titled “Full Refresh”Syncs all records from scratch:
source: name: notion stream: all_pages sync_mode: full_refreshIncremental
Section titled “Incremental”Syncs only pages/blocks updated since last sync using last_edited_time:
source: name: notion stream: all_pages sync_mode: incrementalCheck incremental support:
bizon stream list notion# [Supports incremental] - pages, blocks, blocks_markdown, all_pages, databases, all_databases, all_blocks_markdown# [Full refresh only] - users, data_sources, all_data_sourcesExample Configurations
Section titled “Example Configurations”All Pages to BigQuery
Section titled “All Pages to BigQuery”Extract all accessible pages:
name: notion-all-pages
source: name: notion stream: all_pages sync_mode: incremental authentication: type: api_key params: token: BIZON_ENV_NOTION_TOKEN
destination: name: bigquery config: project_id: my-project dataset_id: notion gcs_buffer_bucket: my-staging-bucketSpecific Database Pages
Section titled “Specific Database Pages”Extract pages from specific databases:
name: notion-database-pages
source: name: notion stream: pages sync_mode: incremental database_ids: - "a1b2c3d4-e5f6-7890-abcd-ef1234567890" - "b2c3d4e5-f6a7-8901-bcde-f23456789012" authentication: type: api_key params: token: BIZON_ENV_NOTION_TOKEN
destination: name: bigquery config: project_id: my-project dataset_id: notion gcs_buffer_bucket: my-staging-bucketBlocks as Markdown
Section titled “Blocks as Markdown”Convert all content blocks to markdown format:
name: notion-blocks-markdown
source: name: notion stream: all_blocks_markdown sync_mode: full_refresh authentication: type: api_key params: token: BIZON_ENV_NOTION_TOKEN
destination: name: bigquery config: project_id: my-project dataset_id: notion gcs_buffer_bucket: my-staging-bucketWith Database Filters
Section titled “With Database Filters”Filter pages by property values:
name: notion-filtered
source: name: notion stream: pages database_ids: - "a1b2c3d4-e5f6-7890-abcd-ef1234567890" database_filters: "a1b2c3d4-e5f6-7890-abcd-ef1234567890": property: "Status" select: equals: "Published" authentication: type: api_key params: token: BIZON_ENV_NOTION_TOKEN
destination: name: bigquery config: project_id: my-project dataset_id: notion gcs_buffer_bucket: my-staging-bucketDatabase Filtering
Section titled “Database Filtering”Use database_filters to filter pages by Notion properties. The filter format follows the Notion API filter syntax:
database_filters: "database-id-1": property: "Status" select: equals: "Published" "database-id-2": and: - property: "Type" select: equals: "Article" - property: "Published" checkbox: equals: trueData Structure
Section titled “Data Structure”Page Records
Section titled “Page Records”| Field | Description |
|---|---|
id | Notion page ID |
object | Always “page” |
created_time | Page creation timestamp |
last_edited_time | Last modification timestamp |
parent | Parent database or page info |
properties | Page properties (title, etc.) |
url | Notion page URL |
Block Records
Section titled “Block Records”| Field | Description |
|---|---|
id | Notion block ID |
type | Block type (paragraph, heading_1, etc.) |
parent_block_id | Parent block ID |
source_page_id | Page this block belongs to |
depth | Nesting depth (0 = top level) |
block_order | Position within parent |
page_order | Global reading order in page |
has_children | Whether block has nested blocks |
Markdown Block Records
Section titled “Markdown Block Records”| Field | Description |
|---|---|
block_id | Notion block ID |
block_type | Block type |
markdown | Converted markdown content |
source_page_id | Page this block belongs to |
depth | Nesting depth |
page_order | Reading order in page |
block_raw | Original block data |
Transforms
Section titled “Transforms”Extract specific properties from pages:
transforms: - label: extract-properties python: | props = data.get('properties', {}) data = { 'id': data.get('id'), 'title': props.get('Name', {}).get('title', [{}])[0].get('plain_text', ''), 'status': props.get('Status', {}).get('select', {}).get('name', ''), 'created_at': data.get('created_time'), 'updated_at': data.get('last_edited_time'), 'url': data.get('url') }Rate Limiting
Section titled “Rate Limiting”Notion has API rate limits (3 requests per second). Bizon handles this automatically with:
source: name: notion stream: all_pages max_workers: 3 # Keep low to respect rate limits api_config: retry_limit: 10 # Max retries on rate limitNext Steps
Section titled “Next Steps”- Sources Overview - Learn about source connectors
- Sync Modes - Understand incremental sync
- Authentication - Configure auth methods
- Transforms - Transform extracted data