Skip to content

Notion Source

The Notion source extracts databases, pages, blocks, and users from Notion workspaces via the Notion API. It supports incremental sync for efficiently fetching only updated content.

Terminal window
pip install bizon[notion]
name: notion-pipeline
source:
name: notion
stream: all_pages
sync_mode: full_refresh
authentication:
type: api_key
params:
token: BIZON_ENV_NOTION_TOKEN
destination:
name: bigquery
config:
project_id: my-project
dataset_id: notion_data
gcs_buffer_bucket: my-bucket

Check available streams:

Terminal window
bizon stream list notion
StreamDescriptionIncremental
databasesDatabases from configured database_idsYes
data_sourcesData sources from configured databasesNo
pagesPages from databases or specific page_idsYes
blocksAll blocks recursively from pages/databasesYes
blocks_markdownBlocks converted to markdown formatYes
usersWorkspace usersNo
all_pagesAll accessible pages via Search APIYes
all_databasesAll accessible databasesYes
all_data_sourcesAll accessible data sourcesNo
all_blocks_markdownAll blocks converted to markdownYes
FieldTypeRequiredDefaultDescription
database_idslist[str]No[]Notion database IDs to fetch
page_idslist[str]No[]Specific page IDs to fetch
fetch_blocks_recursivelyboolNotrueFetch nested blocks recursively
max_recursion_depthintNo5Max nesting depth (1-100)
page_sizeintNo100Results per page (max 100)
max_workersintNo3Concurrent workers (1-10)
database_filtersdictNo{}Database ID to filter mapping

Notion uses API key (internal integration token) authentication:

source:
name: notion
stream: all_pages
authentication:
type: api_key
params:
token: BIZON_ENV_NOTION_TOKEN

To get your token:

  1. Go to Notion Integrations
  2. Create a new internal integration
  3. Copy the integration token
  4. Share your databases/pages with the integration

Syncs all records from scratch:

source:
name: notion
stream: all_pages
sync_mode: full_refresh

Syncs only pages/blocks updated since last sync using last_edited_time:

source:
name: notion
stream: all_pages
sync_mode: incremental

Check incremental support:

Terminal window
bizon stream list notion
# [Supports incremental] - pages, blocks, blocks_markdown, all_pages, databases, all_databases, all_blocks_markdown
# [Full refresh only] - users, data_sources, all_data_sources

Extract all accessible pages:

name: notion-all-pages
source:
name: notion
stream: all_pages
sync_mode: incremental
authentication:
type: api_key
params:
token: BIZON_ENV_NOTION_TOKEN
destination:
name: bigquery
config:
project_id: my-project
dataset_id: notion
gcs_buffer_bucket: my-staging-bucket

Extract pages from specific databases:

name: notion-database-pages
source:
name: notion
stream: pages
sync_mode: incremental
database_ids:
- "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
- "b2c3d4e5-f6a7-8901-bcde-f23456789012"
authentication:
type: api_key
params:
token: BIZON_ENV_NOTION_TOKEN
destination:
name: bigquery
config:
project_id: my-project
dataset_id: notion
gcs_buffer_bucket: my-staging-bucket

Convert all content blocks to markdown format:

name: notion-blocks-markdown
source:
name: notion
stream: all_blocks_markdown
sync_mode: full_refresh
authentication:
type: api_key
params:
token: BIZON_ENV_NOTION_TOKEN
destination:
name: bigquery
config:
project_id: my-project
dataset_id: notion
gcs_buffer_bucket: my-staging-bucket

Filter pages by property values:

name: notion-filtered
source:
name: notion
stream: pages
database_ids:
- "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
database_filters:
"a1b2c3d4-e5f6-7890-abcd-ef1234567890":
property: "Status"
select:
equals: "Published"
authentication:
type: api_key
params:
token: BIZON_ENV_NOTION_TOKEN
destination:
name: bigquery
config:
project_id: my-project
dataset_id: notion
gcs_buffer_bucket: my-staging-bucket

Use database_filters to filter pages by Notion properties. The filter format follows the Notion API filter syntax:

database_filters:
"database-id-1":
property: "Status"
select:
equals: "Published"
"database-id-2":
and:
- property: "Type"
select:
equals: "Article"
- property: "Published"
checkbox:
equals: true
FieldDescription
idNotion page ID
objectAlways “page”
created_timePage creation timestamp
last_edited_timeLast modification timestamp
parentParent database or page info
propertiesPage properties (title, etc.)
urlNotion page URL
FieldDescription
idNotion block ID
typeBlock type (paragraph, heading_1, etc.)
parent_block_idParent block ID
source_page_idPage this block belongs to
depthNesting depth (0 = top level)
block_orderPosition within parent
page_orderGlobal reading order in page
has_childrenWhether block has nested blocks
FieldDescription
block_idNotion block ID
block_typeBlock type
markdownConverted markdown content
source_page_idPage this block belongs to
depthNesting depth
page_orderReading order in page
block_rawOriginal block data

Extract specific properties from pages:

transforms:
- label: extract-properties
python: |
props = data.get('properties', {})
data = {
'id': data.get('id'),
'title': props.get('Name', {}).get('title', [{}])[0].get('plain_text', ''),
'status': props.get('Status', {}).get('select', {}).get('name', ''),
'created_at': data.get('created_time'),
'updated_at': data.get('last_edited_time'),
'url': data.get('url')
}

Notion has API rate limits (3 requests per second). Bizon handles this automatically with:

source:
name: notion
stream: all_pages
max_workers: 3 # Keep low to respect rate limits
api_config:
retry_limit: 10 # Max retries on rate limit