Skip to main content

Dynamic Sync System - Complete Guide

Overview

Your sync system is now fully dynamic and can handle any platform (Notion, Airtable, etc.) with intelligent filtering by platform, application, component, vault, user, and more.

Quick Start

# See what's available
./dj.sh sync-cheat

# Sync everything
./dj.sh sync

# Dry run first
./dj.sh sync-dry

# Sync only what needs syncing
./dj.sh sync-due

Architecture

File Structure

vault/api/
├── core/
│   ├── services/sync/
│   │   ├── __init__.py                    # Package exports
│   │   ├── notion_optimized.py            # Main optimized service
│   │   ├── connection_pool.py             # Connection pooling
│   │   ├── rate_limiter.py                # Rate limiting & retry
│   │   ├── metrics.py                     # Metrics tracking
│   │   ├── streaming.py                   # Streaming fetcher
│   │   ├── incremental_store.py           # Incremental storage
│   │   └── health_check.py                # Health monitoring
│   │
│   └── management/commands/
│       ├── appsync.py            # Dynamic sync command
│       ├── sync_cheatsheet.py             # Interactive reference
│       └── inspect_sync_sources.py        # DB inspector

└── dj.sh                                  # Shell script wrapper

Key Components

  1. Dynamic Query Builder: Handles any combination of filters
  2. Platform Detection: Auto-detects platform type from your models
  3. Connection Pooling: Reuses connections across syncs
  4. Rate Limiting: Token bucket with burst support
  5. Incremental Sync: Only updates changed records
  6. Streaming: Memory-efficient batch processing
  7. Metrics: Comprehensive performance tracking

Command Reference

Discovery Commands

# Interactive cheatsheet (shows YOUR data)
./dj.sh sync-cheat

# Quick reference
./dj.sh sync-quick

# Real-world examples
./dj.sh sync-examples

# Inspect database structure
./dj.sh sync-inspect

# List all available targets
./dj.sh sync-list

Basic Sync

# Default: sync all Notion sources
./dj.sh sync

# Sync specific platform
./dj.sh sync-notion
./dj.sh sync-airtable

# Sync all platforms
./dj.sh sync-all

Filtered Sync

# By application
./dj.sh sync-app my-app-name

# By component type
./dj.sh sync-component artefact
./dj.sh sync-artefacts  # Shortcut

# By source
./dj.sh sync-source my-source-alias

# By vault
./dj.sh sync-vault my-vault

# By user
./dj.sh sync-user username

Smart Sync

# Only sources that need syncing
./dj.sh sync-due

# Only enabled sources
./dj.sh sync-enabled

# Test without syncing
./dj.sh sync-dry

Performance Modes

# High performance (20 pool, 10 concurrent)
./dj.sh sync-fast

# Conservative (5 pool, 2 concurrent)
./dj.sh sync-safe

# Full replacement (no incremental)
./dj.sh sync-full

# Legacy mode (no optimizations)
./dj.sh sync-legacy

Advanced Combinations

# App + Component
./dj.sh sync-app-component skyflow artefact

# Platform + Component
./dj.sh sync-platform-component notion feature

# App + Fast mode
./dj.sh sync-app-fast my-app

# Custom (pass any flags)
./dj.sh sync-custom --platform notion --only-due --verbose

Monitoring

# Health check (last 24h)
./dj.sh sync-health

# Connection pool stats
./dj.sh sync-stats

Direct Python Usage

# Basic syntax
python manage.py appsync <target> [options]

# Examples
python manage.py appsync notion
python manage.py appsync skyflow --component artefact
python manage.py appsync all --only-due --verbose
python manage.py appsync all --platform notion --app skyflow

Available Options

OptionDescriptionDefault
--platformFilter by platform typeNone
--appFilter by application name/slugNone
--sourceFilter by source aliasNone
--componentFilter by component typeNone
--vaultFilter by vault name/IDNone
--userFilter by username/IDNone
--only-enabledOnly sync enabled sourcesFalse
--only-dueOnly sync sources due nowFalse
--pool-sizeConnection pool size10
--max-concurrentMax concurrent syncs5
--batch-sizeStreaming batch size100
--rate-limitAPI calls per second3.0
--no-streamingDisable streamingFalse
--no-incrementalFull replacement syncFalse
--dry-runShow plan without syncingFalse
--verboseDetailed outputFalse

How It Works

Query Building

The system intelligently interprets your target:
# These all work:
./dj.sh sync notion        # Platform type
./dj.sh sync skyflow       # App name
./dj.sh sync all           # Everything

# With filters:
./dj.sh sync-custom --platform notion --component artefact --only-due

Platform Detection

The system automatically:
  1. Detects V01tDataResource.platform.platform_type
  2. Groups sources by platform
  3. Selects appropriate sync service (Notion, Airtable, etc.)
  4. Routes to correct handler

Sync Flow

1. Build Query

2. Fetch Sources (with all filters)

3. Group by Platform

4. For Each Platform:
   a. Initialize Service
   b. Apply Rate Limiting
   c. Sync in Parallel (respecting max_concurrent)
   d. Track Metrics

5. Update Database Status

6. Display Results

Common Workflows

Daily Sync

# Morning routine
./dj.sh sync-health      # Check status
./dj.sh sync-due         # Sync what's needed

After Content Changes

./dj.sh sync-artefacts   # Quick artefact sync

Troubleshooting

./dj.sh sync-inspect     # Check configuration
./dj.sh sync-dry         # Test sync plan
./dj.sh sync-safe        # Conservative sync

Large Import

./dj.sh sync-fast        # High performance

Specific App Update

./dj.sh sync-app skyflow

Performance Tuning

Settings Explained

Pool Size (--pool-size):
  • Number of reusable connections
  • Higher = more memory, faster for many sources
  • Recommended: 5-20
Max Concurrent (--max-concurrent):
  • How many syncs run in parallel
  • Higher = faster but more API load
  • Recommended: 2-10
Batch Size (--batch-size):
  • Records fetched per API call
  • Higher = fewer calls, more memory
  • Recommended: 50-200
Rate Limit (--rate-limit):
  • API calls per second
  • Match your platform’s limits
  • Notion: 3.0, Airtable: 5.0

Presets

ModePoolConcurrentUse When
Safe52Errors, instability
Default105Normal use
Fast2010Large imports

Extending the System

Adding New Platforms

  1. Create service in core/services/sync/:
# core/services/sync/airtable_optimized.py
class AirtableSyncServiceOptimized:
    # Implement same interface as NotionSyncServiceOptimized
    pass
  1. Update appsync.py:
def _sync_platform_sources(self, platform_type, sources, options):
    if platform_type == 'notion':
        return self._sync_notion_sources(sources, options)
    elif platform_type == 'airtable':
        return self._sync_airtable_sources(sources, options)
  1. Shell script automatically works!

Custom Metrics

Add to SyncMetrics in metrics.py:
@dataclass
class SyncMetrics:
    # Add your custom metrics
    custom_metric: int = 0

Troubleshooting

No Sources Found

./dj.sh sync-list        # See what's available
./dj.sh sync-inspect     # Check configuration

Sync Failures

./dj.sh sync-health      # Check health
./dj.sh sync-safe        # Try conservative mode
./dj.sh sync-custom --verbose  # Debug mode

Rate Limits

# Reduce rate
./dj.sh sync-custom --rate-limit 1.0

# Or use safe mode
./dj.sh sync-safe

Memory Issues

# Reduce batch size and concurrency