TekTree Non-Functional Requirements
Version: 1.0.0 Last Updated: 2025-12-16 Status: Foundation (Pre-Implementation)Document Purpose
This document defines the non-functional requirements (NFRs) for TekTree, including performance, scalability, reliability, security, observability, and operational targets. These requirements are measurable and testable.1. Performance Requirements
NFR-1.1: API Response Time
Requirement ID: NFR-PERF-001 Priority: P0 Targets:| Metric | Target | Measurement Method |
|---|---|---|
| p50 (median) | < 100ms | Prometheus histogram |
| p95 | < 200ms | Prometheus histogram |
| p99 | < 500ms | Prometheus histogram |
| p99.9 | < 1000ms | Prometheus histogram |
- 95% of requests meet p95 target during normal operation
- Performance degradation alerts trigger at p95 > 300ms
- Load testing validates targets at expected traffic levels
- Database query optimization ensures N+1 problems eliminated
NFR-1.2: WebSocket Latency
Requirement ID: NFR-PERF-002 Priority: P0 Targets:| Metric | Target |
|---|---|
| Message delivery latency (p95) | < 50ms |
| Connection establishment | < 100ms |
| Reconnection time | < 2s |
- Real-time notifications delivered within 50ms of event emission
- Presence updates propagated within 100ms
- WebSocket ping/pong heartbeat every 30s
- Graceful reconnection with exponential backoff
NFR-1.3: Database Query Performance
Requirement ID: NFR-PERF-003 Priority: P0 Targets:| Query Type | Target |
|---|---|
| Point reads (by ID) | < 10ms |
| Index scans (paginated list) | < 50ms |
| Full-text search | < 200ms |
| Aggregations | < 500ms |
- All collections have appropriate indexes
- No queries without index usage (logged and alerted)
- Query explain plans reviewed for slow queries (> 100ms)
- Connection pooling configured (min: 10, max: 100)
NFR-1.4: Cache Hit Rate
Requirement ID: NFR-PERF-004 Priority: P1 Targets:| Cache Layer | Target Hit Rate |
|---|---|
| User profile cache | > 90% |
| Content cache (hot data) | > 80% |
| Leaderboard cache | > 95% |
- Redis L2 cache with TTL-based expiration
- Cache warming for popular content
- Cache invalidation on domain events
- Cache hit/miss metrics exported to Prometheus
NFR-1.5: Page Load Time (Frontend)
Requirement ID: NFR-PERF-005 Priority: P1 Targets:| Metric | Target |
|---|---|
| First Contentful Paint (FCP) | < 1.5s |
| Largest Contentful Paint (LCP) | < 2.5s |
| Time to Interactive (TTI) | < 3.5s |
| Cumulative Layout Shift (CLS) | < 0.1 |
- Lighthouse score > 90 for performance
- Code splitting for route-based bundles
- Image optimization (WebP, lazy loading)
- CDN for static assets
2. Scalability Requirements
NFR-2.1: Horizontal Scaling
Requirement ID: NFR-SCALE-001 Priority: P0 Targets:| Service | Min Instances | Max Instances | Scale Trigger |
|---|---|---|---|
| API Gateway | 2 | 10 | CPU > 70% or RPS > 1000 |
| User Service | 2 | 5 | CPU > 70% |
| Knowledge Service | 2 | 10 | CPU > 70% or RPS > 500 |
| Gamification Service | 2 | 5 | Event queue > 1000 |
| Payment Service | 2 | 3 | CPU > 70% |
| Real-Time Service | 2 | 10 | Active connections > 5000 |
- All services are stateless (session in Redis)
- Auto-scaling configured in Railway
- Scale-up time < 60s
- Scale-down grace period 300s (connection draining)
- Load balancing with health checks
NFR-2.2: Concurrent User Capacity
Requirement ID: NFR-SCALE-002 Priority: P0 Targets:| Phase | Concurrent Users | Total Users |
|---|---|---|
| MVP (3 months) | 1,000 | 10,000 |
| Growth (6 months) | 5,000 | 50,000 |
| Scale (12 months) | 10,000 | 100,000 |
- Load testing validates targets with synthetic traffic
- No service degradation at target concurrency
- Database connection pooling prevents exhaustion
- Rate limiting prevents abuse
NFR-2.3: Data Growth
Requirement ID: NFR-SCALE-003 Priority: P1 Projected Growth:| Metric | Year 1 | Year 2 | Year 3 |
|---|---|---|---|
| Users | 100K | 500K | 1M |
| Questions | 500K | 2.5M | 5M |
| Insights | 100K | 500K | 1M |
| Total documents | 2M | 10M | 20M |
| Storage | 100GB | 500GB | 1TB |
- MongoDB sharding plan documented (shard by user_id)
- Archival strategy for old content (> 2 years)
- Data partitioning for analytics queries
- Index size monitoring and optimization
NFR-2.4: Event Bus Throughput
Requirement ID: NFR-SCALE-004 Priority: P0 Targets:| Metric | Target |
|---|---|
| Events per second (sustained) | 1,000 |
| Events per second (burst) | 5,000 |
| Event processing latency (p95) | < 100ms |
| Event backlog size (max) | 10,000 |
- Redis Streams consumer groups for parallel processing
- Dead letter queue for failed events (retry 3x)
- Event rate limiting per service
- Circuit breaker for downstream failures
3. Reliability Requirements
NFR-3.1: Availability (Uptime)
Requirement ID: NFR-REL-001 Priority: P0 SLA Targets:| Tier | Uptime Target | Downtime Budget (Monthly) |
|---|---|---|
| Free | 99% | 7h 18m |
| Pro | 99.5% | 3h 39m |
| Team | 99.9% | 43m 50s |
| Enterprise | 99.95% | 21m 55s |
- Health checks on all services (liveness, readiness)
- Multi-AZ deployment for critical services
- Automated failover for database
- Status page with uptime history (e.g., status.tektree.com)
- Incident response runbook for common failures
- Uptime calculated as:
(Total time - Downtime) / Total time * 100 - Downtime = Any period where service returns 5xx errors for > 1 minute
NFR-3.2: Fault Tolerance
Requirement ID: NFR-REL-002 Priority: P0 Requirements:- Circuit Breaker: Wrap all external service calls (Polar API, email service)
- Open after 5 consecutive failures
- Half-open after 30s
- Close after 3 successful requests
- Retry Logic: Exponential backoff with jitter (max 3 retries)
- Graceful Degradation: Core features continue if non-critical services fail
- Gamification service down → XP events queued, no blocking
- Email service down → Emails queued for retry
- Search service down → Fallback to basic filter
- Bulkhead Pattern: Isolate failure domains (separate connection pools per service)
- Chaos engineering tests validate resilience (kill random service, verify recovery)
- No cascading failures
- Circuit breaker metrics exported to Prometheus
NFR-3.3: Data Durability
Requirement ID: NFR-REL-003 Priority: P0 Targets:- RPO (Recovery Point Objective): < 1 hour
- RTO (Recovery Time Objective): < 4 hours
- MongoDB replica set with at least 3 nodes
- Automated daily backups to object storage
- Point-in-time recovery (PITR) enabled
- Backup restoration tested quarterly
- Event store append-only log for critical events
- Backup retention: 30 days (daily), 12 months (monthly)
NFR-3.4: Disaster Recovery
Requirement ID: NFR-REL-004 Priority: P1 Scenarios Covered:- Database corruption or data loss
- Service region outage
- Accidental data deletion
- Security breach requiring rollback
- Disaster recovery runbook documented
- DR drills conducted bi-annually
- Backup restoration tested (< 4 hour RTO)
- Cross-region backup replication for Enterprise tier
- Immutable backups (versioned in object storage)
4. Security Requirements
NFR-4.1: Authentication Security
Requirement ID: NFR-SEC-001 Priority: P0 Requirements:- Password Security:
- Bcrypt hashing (cost factor 12)
- Min 8 characters, 1 uppercase, 1 number, 1 special char
- Password breach detection (Have I Been Pwned API)
- Password reset with time-limited tokens (1 hour expiry)
- JWT Security:
- RS256 signing (asymmetric keys)
- Short-lived access tokens (15 min)
- Refresh tokens in HTTP-only, Secure, SameSite cookies
- Token revocation list for logout
- Session Management:
- Redis-backed sessions
- Session timeout after 7 days inactivity
- Concurrent session limit (max 5 devices)
- Security audit passes (OWASP Top 10 compliance)
- Penetration testing validates authentication
- Rate limiting on login attempts (5 attempts, 15 min lockout)
NFR-4.2: Authorization Security
Requirement ID: NFR-SEC-002 Priority: P0 Requirements:- RBAC (Role-Based Access Control):
- Roles: User, Moderator, Admin
- Permissions checked at API Gateway and service level
- Tier-Based Access:
- Feature flags enforce tier limits
- Quota validation on resource creation
- Object-Level Authorization:
- User can only edit their own content
- Moderators can edit any content
- Privacy settings respected (public, connections, private)
- Authorization checked on every request
- No privilege escalation vulnerabilities
- Audit log for privileged actions
NFR-4.3: Data Protection
Requirement ID: NFR-SEC-003 Priority: P0 Requirements:- Encryption at Rest:
- MongoDB encryption with AES-256
- Redis encryption enabled
- Encrypted backups
- Encryption in Transit:
- TLS 1.3 for all external communication
- Certificate auto-renewal (Let’s Encrypt)
- HSTS headers enforced
- PII Protection:
- Email addresses hashed for analytics
- User data export API (GDPR compliance)
- Data deletion API (right to be forgotten)
- SSL Labs rating A+ for HTTPS configuration
- No PII in logs or metrics
- Data anonymization for deleted users
NFR-4.4: API Security
Requirement ID: NFR-SEC-004 Priority: P0 Requirements:- Input Validation:
- JSON schema validation at API Gateway
- Max request body size: 10MB
- SQL/NoSQL injection prevention (parameterized queries)
- XSS prevention (content sanitization)
- Rate Limiting:
- Per-user rate limits by tier (see Functional Requirements)
- IP-based rate limiting for unauthenticated endpoints (100 req/hour)
- Exponential backoff headers (Retry-After)
- CORS:
- Whitelist allowed origins (no wildcard in production)
- Credentials allowed only for whitelisted origins
- OWASP ZAP security scan passes
- Fuzz testing validates input handling
- Rate limit bypass attempts logged and alerted
NFR-4.5: Secret Management
Requirement ID: NFR-SEC-005 Priority: P0 Requirements:- All secrets stored in Railway environment variables
- No secrets in code, logs, or version control
- Secret rotation policy (90 days for API keys)
- Separate secrets per environment (dev, staging, prod)
- GitHub secret scanning enabled
- Pre-commit hooks prevent secret commits
- Secret access audit log
5. Observability Requirements
NFR-5.1: Logging
Requirement ID: NFR-OBS-001 Priority: P0 Requirements:- Structured Logging:
- JSON format for all logs
- Required fields: timestamp, level, service, trace_id, message
- Log levels: DEBUG, INFO, WARN, ERROR, FATAL
- Log Aggregation:
- Centralized log storage (Railway logs or external service)
- Searchable and filterable by service, level, trace_id
- Retention: 7 days (INFO), 30 days (ERROR)
- Security Logging:
- All authentication events
- Authorization failures
- Data access (PII reads)
- Configuration changes
- No PII in logs
- Correlation IDs link related logs across services
- Log sampling for high-volume endpoints (1% sample rate)
NFR-5.2: Metrics
Requirement ID: NFR-OBS-002 Priority: P0 Required Metrics: RED Metrics (Request, Error, Duration):- Request rate (requests per second)
- Error rate (errors per second, by type)
- Duration (latency histogram)
- CPU utilization (%)
- Memory utilization (%)
- Disk I/O (IOPS, throughput)
- Network I/O (bytes in/out)
- Active users (DAU, MAU)
- Content creation rate (questions, insights per hour)
- XP earned per user per day
- Subscription conversions (free → paid)
- Revenue (MRR, ARR)
- Prometheus metrics endpoint on all services (
:8080/metrics) - Metrics scraped every 15s
- Metrics retention: 15 days (raw), 90 days (aggregated)
- Grafana dashboards for each service
NFR-5.3: Distributed Tracing
Requirement ID: NFR-OBS-003 Priority: P1 Requirements:- OpenTelemetry SDK integrated in all services
- Trace context propagated via HTTP headers (traceparent)
- Spans created for:
- HTTP requests
- Database queries
- Cache operations
- Event publishing
- External API calls
- Trace sampling: 1% in production, 100% in development
- Jaeger or Zipkin backend for trace visualization
- Traces link to logs via trace_id
- Slow trace detection (p95 > 200ms)
- Trace retention: 7 days
NFR-5.4: Alerting
Requirement ID: NFR-OBS-004 Priority: P0 Alert Definitions: Critical Alerts (P0 - Page On-Call):- Service down (health check failing for > 1 min)
- Error rate > 5% (5xx responses)
- p95 latency > 500ms
- Database connection failures
- Disk usage > 90%
- p95 latency > 300ms
- Error rate > 1%
- Cache hit rate < 70%
- Event queue backlog > 5000
- CPU usage > 80% sustained for 5 min
- Slow queries (> 100ms)
- Rate limit exceeded
- Webhook failures
- Alertmanager configured with routing rules
- Alerts sent to Slack/PagerDuty
- Alert runbooks linked in alert description
- Alert escalation policy defined
NFR-5.5: Dashboards
Requirement ID: NFR-OBS-005 Priority: P1 Required Dashboards:- System Health Dashboard: CPU, memory, disk, network across all services
- API Performance Dashboard: Request rate, error rate, latency by endpoint
- Database Dashboard: Query performance, connection pool, cache hit rate
- Gamification Dashboard: XP earned, achievements unlocked, leaderboard activity
- Business Metrics Dashboard: DAU/MAU, content creation, subscription conversions
- Real-Time Dashboard: Active WebSocket connections, message latency
- Grafana dashboards with auto-refresh (30s)
- Dashboards accessible to all engineers
- Dashboard as code (JSON in version control)
6. Operational Requirements
NFR-6.1: Deployment
Requirement ID: NFR-OPS-001 Priority: P0 Requirements:- CI/CD Pipeline:
- Automated build on commit to main branch
- Run tests (unit, integration) before deploy
- Deploy to staging automatically
- Manual approval for production deploy
- Deployment Strategy:
- Rolling deployment (25% instances at a time)
- Health check before promoting next batch
- Automatic rollback on health check failure
- Deployment Frequency:
- Target: 3-5 deploys per week
- Hotfixes deployed within 1 hour
- Zero-downtime deployments
- Deployment time < 10 minutes
- Rollback time < 5 minutes
NFR-6.2: Database Migrations
Requirement ID: NFR-OPS-002 Priority: P0 Requirements:- Migration Strategy:
- Backward-compatible migrations (add before remove)
- Schema versioning in code
- Automated migration on deploy
- Migration rollback script for each migration
- Migration Testing:
- Test migrations on staging before production
- Performance test migrations (< 1 minute for < 1M documents)
- No data loss during migrations
- Migrations logged and audited
- Failed migrations trigger alerts and block deploy
NFR-6.3: Configuration Management
Requirement ID: NFR-OPS-003 Priority: P1 Requirements:- Configuration Sources:
- Railway environment variables for secrets
- Config files (YAML) for feature flags
- Database for dynamic config (rate limits, feature toggles)
- Configuration Changes:
- No service restart required for dynamic config
- Configuration changes logged and audited
- Rollback capability for config changes
NFR-6.4: Documentation
Requirement ID: NFR-OPS-004 Priority: P1 Required Documentation:- Architecture diagrams (up-to-date)
- API documentation (OpenAPI spec, auto-generated)
- Runbooks for common operations (deploy, rollback, scale)
- Incident response playbooks
- Onboarding guide for new engineers
- Documentation reviewed quarterly
- Documentation searchable and accessible
- Code comments for complex logic
7. Usability Requirements
NFR-7.1: Accessibility
Requirement ID: NFR-USE-001 Priority: P1 Requirements:- WCAG 2.1 Level AA compliance
- Keyboard navigation support
- Screen reader compatibility (ARIA labels)
- Color contrast ratio > 4.5:1
- Focus indicators visible
- Lighthouse accessibility score > 90
- Automated accessibility tests in CI
- Manual accessibility audit with assistive technologies
NFR-7.2: Browser Compatibility
Requirement ID: NFR-USE-002 Priority: P1 Supported Browsers:- Chrome (latest 2 versions)
- Firefox (latest 2 versions)
- Safari (latest 2 versions)
- Edge (latest 2 versions)
- Cross-browser testing in CI (Playwright)
- Progressive enhancement (core features work without JS)
- Polyfills for older browsers
NFR-7.3: Mobile Responsiveness
Requirement ID: NFR-USE-003 Priority: P0 Requirements:- Responsive design (mobile-first)
- Touch-friendly UI (min tap target 44x44px)
- Fast loading on 3G networks (< 5s TTI)
- Mobile Lighthouse score > 85
- Tested on iOS Safari and Android Chrome
- No horizontal scrolling
8. Compliance Requirements
NFR-8.1: Data Privacy (GDPR)
Requirement ID: NFR-COMP-001 Priority: P0 Requirements:- User consent for data collection
- Data export API (JSON format)
- Data deletion API (right to be forgotten)
- Cookie consent banner
- Privacy policy and terms of service
- GDPR compliance checklist complete
- Data processing agreement (DPA) for third-party services
- Data retention policy documented
NFR-8.2: Payment Compliance (PCI DSS)
Requirement ID: NFR-COMP-002 Priority: P0 Requirements:- No credit card data stored in TekTree
- All payment processing via Polar (PCI-compliant)
- HTTPS enforced for all payment flows
- PCI compliance via Polar attestation
- No card data in logs or metrics
Summary: NFR Validation Plan
| Category | Key Metrics | Validation Method | Frequency |
|---|---|---|---|
| Performance | p95 latency < 200ms | Load testing | Weekly (CI), Monthly (full) |
| Scalability | Support 10K concurrent users | Stress testing | Quarterly |
| Reliability | 99.9% uptime | Monitoring | Continuous |
| Security | OWASP Top 10 compliance | Pen testing | Quarterly |
| Observability | 100% service instrumentation | Code review | Per PR |
Document Status: ✅ Complete Related Documents:
ARCHITECTURE_OVERVIEW.md, OBSERVABILITY_PLAN.md, SECURITY_ARCHITECTURE.md