Skip to main content

Multi-Platform Django Scraper Implementation Guide

Overview

This guide shows how to transform your single-platform Notion scraper into a multi-platform architecture that supports multiple data sources with tiered monetization.

Key Features

πŸ—οΈ Multi-Platform Architecture

  • Support for multiple platforms (Notion, Airtable, Monday.com, etc.)
  • Grouped datasets per platform
  • Unified API interface across platforms

πŸ’° Tiered Monetization Strategy

  • Free Tier: 1,000 requests/month, 1 platform, 1 dataset
  • Basic Tier: 10,000 requests/month, 2 platforms, 3 datasets per platform
  • Pro Tier: 50,000 requests/month, 5 platforms, 10 datasets per platform
  • Enterprise Tier: 200,000 requests/month, unlimited platforms and datasets

πŸ”’ Access Control & Usage Tracking

  • Per-user subscription management
  • Request counting and rate limiting
  • Feature-based access control

Implementation Steps

1. Database Setup

# Run migrations
python manage.py makemigrations scraper
python manage.py migrate

# Set up initial tiers and platforms
python manage.py setup_tiers

2. Settings Configuration

Add to your settings.py:
# vault/settings/base.py

MIDDLEWARE = [
    # ... existing middleware
    'scraper.middleware.tier_enforcement.TierEnforcementMiddleware',
]

# Add scraper settings
SCRAPER_SETTINGS = {
    'DEFAULT_CACHE_TIMEOUT': 3600,  # 1 hour
    'MAX_CONCURRENT_REQUESTS': 10,
    'ENABLE_USAGE_TRACKING': True,
    'ENABLE_TIER_ENFORCEMENT': True,
}

3. Platform Configuration Examples

Setting up a Notion Platform

from core.models import Platform, PlatformDataset

# Create Notion platform
notion_platform = Platform.objects.create(
    name="My Notion Workspace",
    platform_type="notion",
    api_config={
        "notion_token": "secret_xxx...",
        "notion_version": "2022-06-28",
        "timeout": 30
    }
)

# Create datasets for this platform
features_dataset = PlatformDataset.objects.create(
    platform=notion_platform,
    name="Product Features",
    description="Main product features database",
    database_config={
        "feature_db_id": "abc123...",
        "tier_db_id": "def456...",
        "gateway_db_id": "ghi789...",
        "tag_db_id": "jkl012...",
        "keyword_db_id": "mno345..."
    }
)

analytics_dataset = PlatformDataset.objects.create(
    platform=notion_platform,
    name="Analytics Data",
    description="Analytics and metrics database",
    database_config={
        "analytics_db_id": "pqr678...",
        "metrics_db_id": "stu901..."
    }
)

Setting up User Subscriptions

from scraper.services.subscription_manager import SubscriptionManager

# Create free subscription for new user
subscription = await SubscriptionManager.create_free_subscription(user)

# Upgrade user to Pro tier
result = await SubscriptionManager.upgrade_subscription(user, 'pro')

4. API Usage Examples

Get User’s Available Platforms

GET /api/scraper/platforms/

Response:
{
    "success": true,
    "data": {
        "platforms": [
            {
                "id": 1,
                "name": "My Notion Workspace",
                "platform_type": "notion",
                "datasets": [
                    {
                        "id": 1,
                        "name": "Product Features",
                        "description": "Main product features database",
                        "is_active": true
                    }
                ]
            }
        ],
        "usage_info": {
            "tier": "pro",
            "requests_used": 1250,
            "requests_limit": 50000,
            "requests_remaining": 48750,
            "usage_percentage": 2.5
        }
    }
}

Get Features from Specific Dataset

GET /api/scraper/datasets/1/features/

Response:
{
    "success": true,
    "data": {
        "dataset_info": {
            "id": 1,
            "name": "Product Features",
            "platform": "My Notion Workspace"
        },
        "features": [
            {
                "id": "feature_123",
                "name": "User Authentication",
                "status": "active",
                "tier": "basic",
                // ... other feature data
            }
        ]
    }
}

Check Subscription Status

GET /api/scraper/subscription/

Response:
{
    "success": true,
    "data": {
        "tier": "pro",
        "is_active": true,
        "expires_at": null,
        "usage": {
            "requests_used": 1250,
            "requests_limit": 50000,
            "requests_remaining": 48750,
            "usage_percentage": 2.5
        },
        "platforms_count": 3,
        "tier_limits": {
            "monthly_requests": 50000,
            "platforms_limit": 5,
            "datasets_per_platform": 10,
            "can_use_analytics": true,
            "can_export": true
        },
        "upgrade_options": {
            "enterprise": {
                "monthly_requests": 200000,
                "platforms_limit": -1,
                "datasets_per_platform": -1
            }
        }
    }
}

5. Adding New Platform Support

To add support for a new platform (e.g., Airtable):
  1. Create Repository Class:
# v01t.io/api/scraper/repository/airtable.py

from .base import BaseRepository
from typing import Dict, List, Any

class AirtableRepository(BaseRepository):
    def __init__(self, api_key: str, base_id: str, **kwargs):
        self.api_key = api_key
        self.base_id = base_id

    async def get_features(self, **kwargs) -> List[Dict[str, Any]]:
        # Implement Airtable-specific feature fetching
        pass

    async def get_gateways(self, **kwargs) -> List[Dict[str, Any]]:
        # Implement Airtable-specific gateway fetching
        pass

    # ... implement other required methods
  1. Register in Platform Manager:
# Update vault/api/scraper/services/platform_manager.py

PLATFORM_REPOSITORIES = {
    'notion': NotionRepository,
    'airtable': AirtableRepository,  # Add this line
    # 'monday': MondayRepository,
}
  1. Update Platform Type Choices:
# In vault/api/scraper/models/platform.py

class PlatformType(models.TextChoices):
    NOTION = 'notion', 'Notion'
    AIRTABLE = 'airtable', 'Airtable'
    MONDAY = 'monday', 'Monday.com'
    ASANA = 'asana', 'Asana'

6. Revenue Analytics

# Generate revenue projections
from scraper.utils.monetization import calculate_monthly_revenue_projection

subscriptions_by_tier = {
    'free': 1000,
    'basic': 150,
    'pro': 75,
    'enterprise': 10
}

revenue_data = calculate_monthly_revenue_projection(subscriptions_by_tier)
# Returns projected monthly revenue breakdown

7. Monitoring and Maintenance

Reset Monthly Usage (Cron Job)

# Add to crontab to run monthly
0 0 1 * * python manage.py reset_monthly_usage

Monitor Platform Health

# Check all platform health
from scraper.services.platform_manager import PlatformManager

manager = PlatformManager()
for platform in Platform.objects.filter(is_active=True):
    for dataset in platform.datasets.filter(is_active=True):
        repo = await manager.get_platform_repository(platform, dataset)
        health = await repo.health_check()
        print(f"{platform.name} - {dataset.name}: {health['status']}")

Migration Strategy

Phase 1: Backward Compatibility

  • Keep existing single-platform endpoints working
  • Add new multi-platform endpoints alongside
  • Gradually migrate users to new API

Phase 2: Data Migration

# Create migration script to convert existing setup
from core.models import Platform, PlatformDataset

# Create default platform for existing users
default_platform = Platform.objects.create(
    name="Legacy Notion Workspace",
    platform_type="notion",
    api_config={
        "notion_token": settings.NOTION_TOKEN,
        # ... other existing config
    }
)

# Create default dataset
default_dataset = PlatformDataset.objects.create(
    platform=default_platform,
    name="Default Dataset",
    database_config={
        "feature_db_id": settings.FEATURE_DB_ID,
        # ... other existing database IDs
    }
)

Phase 3: Full Migration

  • Deprecate old endpoints
  • Force all users to new multi-platform system
  • Remove legacy code

Security Considerations

  1. API Token Storage: Store platform API tokens securely using Django’s encryption
  2. Rate Limiting: Implement per-user rate limiting to prevent abuse
  3. Access Control: Ensure users can only access their authorized datasets
  4. Audit Logging: Log all API requests for monitoring and debugging

Performance Optimizations

  1. Caching: Cache repository instances and frequently accessed data
  2. Connection Pooling: Reuse HTTP connections for external API calls
  3. Async Processing: Use async/await for all external API calls
  4. Background Tasks: Use Celery for heavy data processing operations

Testing Strategy

# v01t.io/api/scraper/tests/test_multiplatform.py

import pytest
from django.test import TestCase
from django.contrib.auth.models import User
from core.models import Platform, PlatformDataset, UserSubscription

class MultiPlatformTestCase(TestCase):
    def setUp(self):
        self.user = User.objects.create_user('testuser', 'test@example.com')
        self.platform = Platform.objects.create(
            name="Test Platform",
            platform_type="notion",
            api_config={"token": "test_token"}
        )

    async def test_user_can_access_authorized_dataset(self):
        # Test access control logic
        pass

    async def test_usage_tracking_increments_correctly(self):
        # Test usage tracking
        pass
This architecture provides a solid foundation for scaling your scraper to support multiple platforms while implementing a clear monetization strategy through tiered subscriptions.