Skip to main content

TekTree Non-Functional Requirements

Version: 1.0.0 Last Updated: 2025-12-16 Status: Foundation (Pre-Implementation)

Document Purpose

This document defines the non-functional requirements (NFRs) for TekTree, including performance, scalability, reliability, security, observability, and operational targets. These requirements are measurable and testable.

1. Performance Requirements

NFR-1.1: API Response Time

Requirement ID: NFR-PERF-001 Priority: P0 Targets:
MetricTargetMeasurement Method
p50 (median)< 100msPrometheus histogram
p95< 200msPrometheus histogram
p99< 500msPrometheus histogram
p99.9< 1000msPrometheus histogram
Scope: All REST API endpoints (excluding file uploads, long-running reports) Acceptance Criteria:
  • 95% of requests meet p95 target during normal operation
  • Performance degradation alerts trigger at p95 > 300ms
  • Load testing validates targets at expected traffic levels
  • Database query optimization ensures N+1 problems eliminated
Measurement:
// Prometheus histogram in middleware
httpDuration.WithLabelValues(method, path, statusCode).Observe(duration.Seconds())

NFR-1.2: WebSocket Latency

Requirement ID: NFR-PERF-002 Priority: P0 Targets:
MetricTarget
Message delivery latency (p95)< 50ms
Connection establishment< 100ms
Reconnection time< 2s
Acceptance Criteria:
  • Real-time notifications delivered within 50ms of event emission
  • Presence updates propagated within 100ms
  • WebSocket ping/pong heartbeat every 30s
  • Graceful reconnection with exponential backoff

NFR-1.3: Database Query Performance

Requirement ID: NFR-PERF-003 Priority: P0 Targets:
Query TypeTarget
Point reads (by ID)< 10ms
Index scans (paginated list)< 50ms
Full-text search< 200ms
Aggregations< 500ms
Acceptance Criteria:
  • All collections have appropriate indexes
  • No queries without index usage (logged and alerted)
  • Query explain plans reviewed for slow queries (> 100ms)
  • Connection pooling configured (min: 10, max: 100)
Indexes Required:
// Users collection
db.users.createIndex({ "email": 1 }, { unique: true })
db.users.createIndex({ "handle": 1 }, { unique: true })

// Questions collection
db.questions.createIndex({ "user": 1, "created": -1 })
db.questions.createIndex({ "areas": 1, "created": -1 })
db.questions.createIndex({ "tags": 1 })

// Full-text search
db.questions.createIndex({ "title": "text", "body": "text" })

NFR-1.4: Cache Hit Rate

Requirement ID: NFR-PERF-004 Priority: P1 Targets:
Cache LayerTarget Hit Rate
User profile cache> 90%
Content cache (hot data)> 80%
Leaderboard cache> 95%
Acceptance Criteria:
  • Redis L2 cache with TTL-based expiration
  • Cache warming for popular content
  • Cache invalidation on domain events
  • Cache hit/miss metrics exported to Prometheus

NFR-1.5: Page Load Time (Frontend)

Requirement ID: NFR-PERF-005 Priority: P1 Targets:
MetricTarget
First Contentful Paint (FCP)< 1.5s
Largest Contentful Paint (LCP)< 2.5s
Time to Interactive (TTI)< 3.5s
Cumulative Layout Shift (CLS)< 0.1
Acceptance Criteria:
  • Lighthouse score > 90 for performance
  • Code splitting for route-based bundles
  • Image optimization (WebP, lazy loading)
  • CDN for static assets

2. Scalability Requirements

NFR-2.1: Horizontal Scaling

Requirement ID: NFR-SCALE-001 Priority: P0 Targets:
ServiceMin InstancesMax InstancesScale Trigger
API Gateway210CPU > 70% or RPS > 1000
User Service25CPU > 70%
Knowledge Service210CPU > 70% or RPS > 500
Gamification Service25Event queue > 1000
Payment Service23CPU > 70%
Real-Time Service210Active connections > 5000
Acceptance Criteria:
  • All services are stateless (session in Redis)
  • Auto-scaling configured in Railway
  • Scale-up time < 60s
  • Scale-down grace period 300s (connection draining)
  • Load balancing with health checks

NFR-2.2: Concurrent User Capacity

Requirement ID: NFR-SCALE-002 Priority: P0 Targets:
PhaseConcurrent UsersTotal Users
MVP (3 months)1,00010,000
Growth (6 months)5,00050,000
Scale (12 months)10,000100,000
Acceptance Criteria:
  • Load testing validates targets with synthetic traffic
  • No service degradation at target concurrency
  • Database connection pooling prevents exhaustion
  • Rate limiting prevents abuse

NFR-2.3: Data Growth

Requirement ID: NFR-SCALE-003 Priority: P1 Projected Growth:
MetricYear 1Year 2Year 3
Users100K500K1M
Questions500K2.5M5M
Insights100K500K1M
Total documents2M10M20M
Storage100GB500GB1TB
Acceptance Criteria:
  • MongoDB sharding plan documented (shard by user_id)
  • Archival strategy for old content (> 2 years)
  • Data partitioning for analytics queries
  • Index size monitoring and optimization

NFR-2.4: Event Bus Throughput

Requirement ID: NFR-SCALE-004 Priority: P0 Targets:
MetricTarget
Events per second (sustained)1,000
Events per second (burst)5,000
Event processing latency (p95)< 100ms
Event backlog size (max)10,000
Acceptance Criteria:
  • Redis Streams consumer groups for parallel processing
  • Dead letter queue for failed events (retry 3x)
  • Event rate limiting per service
  • Circuit breaker for downstream failures

3. Reliability Requirements

NFR-3.1: Availability (Uptime)

Requirement ID: NFR-REL-001 Priority: P0 SLA Targets:
TierUptime TargetDowntime Budget (Monthly)
Free99%7h 18m
Pro99.5%3h 39m
Team99.9%43m 50s
Enterprise99.95%21m 55s
Acceptance Criteria:
  • Health checks on all services (liveness, readiness)
  • Multi-AZ deployment for critical services
  • Automated failover for database
  • Status page with uptime history (e.g., status.tektree.com)
  • Incident response runbook for common failures
Measurement:
  • Uptime calculated as: (Total time - Downtime) / Total time * 100
  • Downtime = Any period where service returns 5xx errors for > 1 minute

NFR-3.2: Fault Tolerance

Requirement ID: NFR-REL-002 Priority: P0 Requirements:
  • Circuit Breaker: Wrap all external service calls (Polar API, email service)
    • Open after 5 consecutive failures
    • Half-open after 30s
    • Close after 3 successful requests
  • Retry Logic: Exponential backoff with jitter (max 3 retries)
  • Graceful Degradation: Core features continue if non-critical services fail
    • Gamification service down → XP events queued, no blocking
    • Email service down → Emails queued for retry
    • Search service down → Fallback to basic filter
  • Bulkhead Pattern: Isolate failure domains (separate connection pools per service)
Acceptance Criteria:
  • Chaos engineering tests validate resilience (kill random service, verify recovery)
  • No cascading failures
  • Circuit breaker metrics exported to Prometheus

NFR-3.3: Data Durability

Requirement ID: NFR-REL-003 Priority: P0 Targets:
  • RPO (Recovery Point Objective): < 1 hour
  • RTO (Recovery Time Objective): < 4 hours
Acceptance Criteria:
  • MongoDB replica set with at least 3 nodes
  • Automated daily backups to object storage
  • Point-in-time recovery (PITR) enabled
  • Backup restoration tested quarterly
  • Event store append-only log for critical events
  • Backup retention: 30 days (daily), 12 months (monthly)

NFR-3.4: Disaster Recovery

Requirement ID: NFR-REL-004 Priority: P1 Scenarios Covered:
  • Database corruption or data loss
  • Service region outage
  • Accidental data deletion
  • Security breach requiring rollback
Acceptance Criteria:
  • Disaster recovery runbook documented
  • DR drills conducted bi-annually
  • Backup restoration tested (< 4 hour RTO)
  • Cross-region backup replication for Enterprise tier
  • Immutable backups (versioned in object storage)

4. Security Requirements

NFR-4.1: Authentication Security

Requirement ID: NFR-SEC-001 Priority: P0 Requirements:
  • Password Security:
    • Bcrypt hashing (cost factor 12)
    • Min 8 characters, 1 uppercase, 1 number, 1 special char
    • Password breach detection (Have I Been Pwned API)
    • Password reset with time-limited tokens (1 hour expiry)
  • JWT Security:
    • RS256 signing (asymmetric keys)
    • Short-lived access tokens (15 min)
    • Refresh tokens in HTTP-only, Secure, SameSite cookies
    • Token revocation list for logout
  • Session Management:
    • Redis-backed sessions
    • Session timeout after 7 days inactivity
    • Concurrent session limit (max 5 devices)
Acceptance Criteria:
  • Security audit passes (OWASP Top 10 compliance)
  • Penetration testing validates authentication
  • Rate limiting on login attempts (5 attempts, 15 min lockout)

NFR-4.2: Authorization Security

Requirement ID: NFR-SEC-002 Priority: P0 Requirements:
  • RBAC (Role-Based Access Control):
    • Roles: User, Moderator, Admin
    • Permissions checked at API Gateway and service level
  • Tier-Based Access:
    • Feature flags enforce tier limits
    • Quota validation on resource creation
  • Object-Level Authorization:
    • User can only edit their own content
    • Moderators can edit any content
    • Privacy settings respected (public, connections, private)
Acceptance Criteria:
  • Authorization checked on every request
  • No privilege escalation vulnerabilities
  • Audit log for privileged actions

NFR-4.3: Data Protection

Requirement ID: NFR-SEC-003 Priority: P0 Requirements:
  • Encryption at Rest:
    • MongoDB encryption with AES-256
    • Redis encryption enabled
    • Encrypted backups
  • Encryption in Transit:
    • TLS 1.3 for all external communication
    • Certificate auto-renewal (Let’s Encrypt)
    • HSTS headers enforced
  • PII Protection:
    • Email addresses hashed for analytics
    • User data export API (GDPR compliance)
    • Data deletion API (right to be forgotten)
Acceptance Criteria:
  • SSL Labs rating A+ for HTTPS configuration
  • No PII in logs or metrics
  • Data anonymization for deleted users

NFR-4.4: API Security

Requirement ID: NFR-SEC-004 Priority: P0 Requirements:
  • Input Validation:
    • JSON schema validation at API Gateway
    • Max request body size: 10MB
    • SQL/NoSQL injection prevention (parameterized queries)
    • XSS prevention (content sanitization)
  • Rate Limiting:
    • Per-user rate limits by tier (see Functional Requirements)
    • IP-based rate limiting for unauthenticated endpoints (100 req/hour)
    • Exponential backoff headers (Retry-After)
  • CORS:
    • Whitelist allowed origins (no wildcard in production)
    • Credentials allowed only for whitelisted origins
Acceptance Criteria:
  • OWASP ZAP security scan passes
  • Fuzz testing validates input handling
  • Rate limit bypass attempts logged and alerted

NFR-4.5: Secret Management

Requirement ID: NFR-SEC-005 Priority: P0 Requirements:
  • All secrets stored in Railway environment variables
  • No secrets in code, logs, or version control
  • Secret rotation policy (90 days for API keys)
  • Separate secrets per environment (dev, staging, prod)
Acceptance Criteria:
  • GitHub secret scanning enabled
  • Pre-commit hooks prevent secret commits
  • Secret access audit log

5. Observability Requirements

NFR-5.1: Logging

Requirement ID: NFR-OBS-001 Priority: P0 Requirements:
  • Structured Logging:
    • JSON format for all logs
    • Required fields: timestamp, level, service, trace_id, message
    • Log levels: DEBUG, INFO, WARN, ERROR, FATAL
  • Log Aggregation:
    • Centralized log storage (Railway logs or external service)
    • Searchable and filterable by service, level, trace_id
    • Retention: 7 days (INFO), 30 days (ERROR)
  • Security Logging:
    • All authentication events
    • Authorization failures
    • Data access (PII reads)
    • Configuration changes
Acceptance Criteria:
  • No PII in logs
  • Correlation IDs link related logs across services
  • Log sampling for high-volume endpoints (1% sample rate)
Example Log Entry:
{
  "timestamp": "2025-12-16T14:30:00Z",
  "level": "INFO",
  "service": "knowledge-service",
  "trace_id": "abc123xyz",
  "user_id": "usr_123",
  "endpoint": "/api/v1/questions",
  "method": "POST",
  "status": 201,
  "duration_ms": 45,
  "message": "Question created successfully"
}

NFR-5.2: Metrics

Requirement ID: NFR-OBS-002 Priority: P0 Required Metrics: RED Metrics (Request, Error, Duration):
  • Request rate (requests per second)
  • Error rate (errors per second, by type)
  • Duration (latency histogram)
USE Metrics (Utilization, Saturation, Errors):
  • CPU utilization (%)
  • Memory utilization (%)
  • Disk I/O (IOPS, throughput)
  • Network I/O (bytes in/out)
Business Metrics:
  • Active users (DAU, MAU)
  • Content creation rate (questions, insights per hour)
  • XP earned per user per day
  • Subscription conversions (free → paid)
  • Revenue (MRR, ARR)
Acceptance Criteria:
  • Prometheus metrics endpoint on all services (:8080/metrics)
  • Metrics scraped every 15s
  • Metrics retention: 15 days (raw), 90 days (aggregated)
  • Grafana dashboards for each service

NFR-5.3: Distributed Tracing

Requirement ID: NFR-OBS-003 Priority: P1 Requirements:
  • OpenTelemetry SDK integrated in all services
  • Trace context propagated via HTTP headers (traceparent)
  • Spans created for:
    • HTTP requests
    • Database queries
    • Cache operations
    • Event publishing
    • External API calls
  • Trace sampling: 1% in production, 100% in development
Acceptance Criteria:
  • Jaeger or Zipkin backend for trace visualization
  • Traces link to logs via trace_id
  • Slow trace detection (p95 > 200ms)
  • Trace retention: 7 days

NFR-5.4: Alerting

Requirement ID: NFR-OBS-004 Priority: P0 Alert Definitions: Critical Alerts (P0 - Page On-Call):
  • Service down (health check failing for > 1 min)
  • Error rate > 5% (5xx responses)
  • p95 latency > 500ms
  • Database connection failures
  • Disk usage > 90%
Warning Alerts (P1 - Slack Notification):
  • p95 latency > 300ms
  • Error rate > 1%
  • Cache hit rate < 70%
  • Event queue backlog > 5000
  • CPU usage > 80% sustained for 5 min
Info Alerts (P2 - Log Only):
  • Slow queries (> 100ms)
  • Rate limit exceeded
  • Webhook failures
Acceptance Criteria:
  • Alertmanager configured with routing rules
  • Alerts sent to Slack/PagerDuty
  • Alert runbooks linked in alert description
  • Alert escalation policy defined

NFR-5.5: Dashboards

Requirement ID: NFR-OBS-005 Priority: P1 Required Dashboards:
  • System Health Dashboard: CPU, memory, disk, network across all services
  • API Performance Dashboard: Request rate, error rate, latency by endpoint
  • Database Dashboard: Query performance, connection pool, cache hit rate
  • Gamification Dashboard: XP earned, achievements unlocked, leaderboard activity
  • Business Metrics Dashboard: DAU/MAU, content creation, subscription conversions
  • Real-Time Dashboard: Active WebSocket connections, message latency
Acceptance Criteria:
  • Grafana dashboards with auto-refresh (30s)
  • Dashboards accessible to all engineers
  • Dashboard as code (JSON in version control)

6. Operational Requirements

NFR-6.1: Deployment

Requirement ID: NFR-OPS-001 Priority: P0 Requirements:
  • CI/CD Pipeline:
    • Automated build on commit to main branch
    • Run tests (unit, integration) before deploy
    • Deploy to staging automatically
    • Manual approval for production deploy
  • Deployment Strategy:
    • Rolling deployment (25% instances at a time)
    • Health check before promoting next batch
    • Automatic rollback on health check failure
  • Deployment Frequency:
    • Target: 3-5 deploys per week
    • Hotfixes deployed within 1 hour
Acceptance Criteria:
  • Zero-downtime deployments
  • Deployment time < 10 minutes
  • Rollback time < 5 minutes

NFR-6.2: Database Migrations

Requirement ID: NFR-OPS-002 Priority: P0 Requirements:
  • Migration Strategy:
    • Backward-compatible migrations (add before remove)
    • Schema versioning in code
    • Automated migration on deploy
    • Migration rollback script for each migration
  • Migration Testing:
    • Test migrations on staging before production
    • Performance test migrations (< 1 minute for < 1M documents)
Acceptance Criteria:
  • No data loss during migrations
  • Migrations logged and audited
  • Failed migrations trigger alerts and block deploy

NFR-6.3: Configuration Management

Requirement ID: NFR-OPS-003 Priority: P1 Requirements:
  • Configuration Sources:
    • Railway environment variables for secrets
    • Config files (YAML) for feature flags
    • Database for dynamic config (rate limits, feature toggles)
  • Configuration Changes:
    • No service restart required for dynamic config
    • Configuration changes logged and audited
    • Rollback capability for config changes

NFR-6.4: Documentation

Requirement ID: NFR-OPS-004 Priority: P1 Required Documentation:
  • Architecture diagrams (up-to-date)
  • API documentation (OpenAPI spec, auto-generated)
  • Runbooks for common operations (deploy, rollback, scale)
  • Incident response playbooks
  • Onboarding guide for new engineers
Acceptance Criteria:
  • Documentation reviewed quarterly
  • Documentation searchable and accessible
  • Code comments for complex logic

7. Usability Requirements

NFR-7.1: Accessibility

Requirement ID: NFR-USE-001 Priority: P1 Requirements:
  • WCAG 2.1 Level AA compliance
  • Keyboard navigation support
  • Screen reader compatibility (ARIA labels)
  • Color contrast ratio > 4.5:1
  • Focus indicators visible
Acceptance Criteria:
  • Lighthouse accessibility score > 90
  • Automated accessibility tests in CI
  • Manual accessibility audit with assistive technologies

NFR-7.2: Browser Compatibility

Requirement ID: NFR-USE-002 Priority: P1 Supported Browsers:
  • Chrome (latest 2 versions)
  • Firefox (latest 2 versions)
  • Safari (latest 2 versions)
  • Edge (latest 2 versions)
Acceptance Criteria:
  • Cross-browser testing in CI (Playwright)
  • Progressive enhancement (core features work without JS)
  • Polyfills for older browsers

NFR-7.3: Mobile Responsiveness

Requirement ID: NFR-USE-003 Priority: P0 Requirements:
  • Responsive design (mobile-first)
  • Touch-friendly UI (min tap target 44x44px)
  • Fast loading on 3G networks (< 5s TTI)
Acceptance Criteria:
  • Mobile Lighthouse score > 85
  • Tested on iOS Safari and Android Chrome
  • No horizontal scrolling

8. Compliance Requirements

NFR-8.1: Data Privacy (GDPR)

Requirement ID: NFR-COMP-001 Priority: P0 Requirements:
  • User consent for data collection
  • Data export API (JSON format)
  • Data deletion API (right to be forgotten)
  • Cookie consent banner
  • Privacy policy and terms of service
Acceptance Criteria:
  • GDPR compliance checklist complete
  • Data processing agreement (DPA) for third-party services
  • Data retention policy documented

NFR-8.2: Payment Compliance (PCI DSS)

Requirement ID: NFR-COMP-002 Priority: P0 Requirements:
  • No credit card data stored in TekTree
  • All payment processing via Polar (PCI-compliant)
  • HTTPS enforced for all payment flows
Acceptance Criteria:
  • PCI compliance via Polar attestation
  • No card data in logs or metrics

Summary: NFR Validation Plan

CategoryKey MetricsValidation MethodFrequency
Performancep95 latency < 200msLoad testingWeekly (CI), Monthly (full)
ScalabilitySupport 10K concurrent usersStress testingQuarterly
Reliability99.9% uptimeMonitoringContinuous
SecurityOWASP Top 10 compliancePen testingQuarterly
Observability100% service instrumentationCode reviewPer PR

Document Status: ✅ Complete Related Documents: ARCHITECTURE_OVERVIEW.md, OBSERVABILITY_PLAN.md, SECURITY_ARCHITECTURE.md