Skip to main content

TASKSET 5 (SPAWN) - STRATEGIC BUILD PLAN

Component Overview

SPAWN (Semantic Preparation And Navigation Yield) is the Hydration System for Clari - responsible for enriching documents with metadata, populating MDX templates, extracting semantic content, and preparing content for real-time collaboration and search. Previous Layer: CAST (Tag Resolution & Linking) ✅ Complete
Current Layer: SPAWN (Hydration & Metadata Extraction) - Starting Now
Next Layer: STITCH (Content Coordination System)

Architecture Overview

Document Input (Markdown/MDX)

    ┌─────────────────────────┐
    │  SPAWN Subsystem        │
    │  ─────────────────────  │
    │  • Metadata Extractor   │
    │  • MDX Template Engine  │
    │  • Content Enrichment   │
    │  • FrontMatter Handler  │
    │  • Asset Resolver       │
    │  • Semantic Parser      │
    └─────────────────────────┘

    Enriched Document Output
    (with metadata, templates, refs)

4-Stage Build Strategy

STAGE 1: Metadata & FrontMatter (35% Effort)

Deliverables: 500 lines, 12+ tests
  • Metadata extractor (YAML/JSON frontmatter parsing)
  • Schema validator (type-safe metadata validation)
  • Template engine (Handlebars/Liquid-like syntax)
  • Asset resolver (images, diagrams, code blocks)
Key Methods:
  • ExtractMetadata() - Parse document frontmatter
  • ValidateSchema() - Validate against schema definitions
  • ResolveAssets() - Find and link embedded assets
  • BuildMetadataIndex() - Create searchable metadata

STAGE 2: Content Enrichment (30% Effort)

Deliverables: 450 lines, 14+ tests
  • Semantic tagging integration (with CAST)
  • Table of contents generation
  • Section summaries extraction
  • Code block highlighting metadata
  • Diagram metadata extraction
Key Methods:
  • EnrichContent() - Add semantic information
  • GenerateTOC() - Create hierarchical TOC
  • ExtractSummary() - Section summaries
  • AnalyzeCodeBlocks() - Extract code metadata
  • ExtractDiagrams() - Parse Mermaid/SVG metadata

STAGE 3: Template Population (20% Effort)

Deliverables: 350 lines, 10+ tests
  • MDX component registration
  • Template variable substitution
  • Conditional content rendering
  • Dynamic section rendering
  • Cross-reference resolution
Key Methods:
  • RegisterComponents() - Register MDX components
  • PopulateTemplate() - Substitute variables
  • RenderConditional() - Conditional content
  • ResolveCrossReferences() - @link/@reference/@see resolution
  • BuildComponentTree() - Hierarchical component structure

STAGE 4: Integration & API (15% Effort)

Deliverables: 280 lines, 8+ tests
  • REST API endpoints
  • Service orchestration
  • Cache integration
  • Pipeline orchestration
  • WebSocket preparation
API Endpoints:
  • POST /api/v1/spawn/enrich - Enrich single document
  • POST /api/v1/spawn/batch - Batch enrichment
  • GET /api/v1/spawn/metadata/:doc_id - Retrieve metadata
  • POST /api/v1/spawn/template/:template_id - Render template
  • GET /api/v1/spawn/health - Health check

Implementation Schedule

PhaseDurationFilesTestsLOC
Stage 12-3hmetadata.go, schema.go12+500
Stage 22-3henricher.go14+450
Stage 31-2htemplates.go10+350
Stage 41-2hservice.go, handlers.go8+280
Testing1-2hspawn_test.go, integration_test.go20+400
Total8-12h~10 files64+ tests1,980 lines

Technical Decisions

1. Metadata Format

  • Primary: YAML frontmatter (compatible with most doc systems)
  • Secondary: JSON frontmatter support
  • Schema: JSON Schema for validation
  • Example:
---
id: doc-123
title: "Advanced Features"
author: "John Doe"
date: "2025-12-05"
tags: ["feature", "advanced"]
seo:
    keywords: ["documentation", "features"]
    description: "Advanced features guide"
toc: true
---

2. Enrichment Strategy

  • Semantic tagging: Integration with CAST tag resolver
  • Table of contents: Auto-generated from headings
  • Cross-references: Resolved via CAST link resolution
  • Code metadata: Language, line count, syntax highlighting info
  • Asset resolution: CDN URLs, local paths, remote resources

3. Template Engine

  • Format: Handlebars-compatible syntax
  • Variables: {{ variable_name }}
  • Conditionals: {{#if condition}} ... {{/if}}
  • Loops: {{#each array}} ... {{/each}}
  • Helpers: {{ toUpperCase text }}

4. Caching Strategy

  • Metadata cache: 1-hour TTL (60 minutes)
  • Template cache: 24-hour TTL (persistent for static templates)
  • Enrichment cache: Document version-based invalidation
  • Asset cache: 7-day TTL with ETag validation

Dependencies

Internal

  • pkg/cast (CAST subsystem for tag resolution)
  • pkg/sift (SIFT subsystem for document quality)

External

  • gopkg.in/yaml.v3 - YAML parsing
  • encoding/json - JSON parsing
  • regexp - Pattern matching for content extraction
  • github.com/Masterminds/sprig - Template functions (Handlebars helpers)

Success Criteria

CriterionTargetDefinition
Test Coverage50+ testsUnit + integration tests for all components
Pass Rate100%All tests passing, 0 failures
Performance<100msMetadata extraction <50ms, enrichment <100ms
Compilation0 errorsClean Go build, type-safe throughout
Code QualityZero warningsConsistent style, proper error handling
Functions100+Comprehensive function suite
Lines2,000+Production-grade implementation

Risk Mitigation

RiskProbabilityImpactMitigation
Template syntax errorsMediumHighComprehensive validation, early error reporting
Asset resolution failuresMediumMediumGraceful fallback, detailed logging
Performance degradationLowHighAggressive caching, benchmarking
CAST integration issuesLowHighMock CAST for independent testing
Schema validation complexityLowMediumReusable validation library

Go/No-Go Checkpoints

Before Starting

  • CAST subsystem is production-ready (verify DELIVERY.md)
  • Go 1.21+ environment configured
  • PostgreSQL available for schema testing
  • Development machine ready (4GB+ RAM recommended)

After Stage 1

  • All metadata extraction tests pass
  • Schema validation comprehensive
  • Asset resolution working
  • <50ms metadata extraction confirmed

After Stage 2

  • Content enrichment tests pass
  • CAST integration verified
  • TOC generation accurate
  • Code block analysis working

After Stage 3

  • Template engine tests pass
  • MDX component registration working
  • Cross-references resolved
  • Conditional rendering accurate

After Stage 4

  • All API endpoints working
  • Health check responsive
  • Service orchestration complete
  • Cache integration verified

Final Verification

  • 50+ tests passing
  • 100% pass rate confirmed
  • Performance targets met
  • Zero compilation errors
  • Code coverage >80%

File Structure

/Users/alexarno/materi/clari/backend/pkg/spawn/
├── types.go              # Data models & enums (150 lines)
├── metadata.go           # Metadata extraction (220 lines)
├── schema.go             # Schema validation (180 lines)
├── enricher.go           # Content enrichment (280 lines)
├── templates.go          # Template engine (170 lines)
├── service.go            # Service layer (280 lines)
├── spawn_test.go         # Unit tests (400 lines)
├── integration_test.go   # Integration tests (320 lines)
└── DELIVERY.md           # Complete documentation

Next Phase: STITCH

After SPAWN is complete:
  • STITCH - Content Coordination System
  • Repository synchronization
  • Cross-document linking
  • Event-driven updates
  • Real-time collaboration bridge

Status: Ready for Implementation
Authority: CTO Approval
Recommended: Proceed with Stage 1 immediately