Skip to main content

What Happened

A routine attempt to deploy the Traceo web frontend to Vercel uncovered that the monorepo was not deployment-ready. A structured 4-stream audit (files, SQL, Python, config) identified 15 critical issues that would have caused authentication failures, database policy rejections, and service crashes in production. All 15 were fixed across 7 gated tasksets before any deployment attempt.

Business Impact

Prevented a broken launch. The auth provider split (Supabase references in config/SQL vs BetterAuth in application code) meant every authenticated request would have failed in production. Users would see login succeed but every subsequent action fail — the worst possible first impression. Security exposure closed. An embedded GitHub access token in git configuration was transmitting credentials with every git operation. This was invisible to code review because it lived in .git/config, not in tracked files. Discovered only through the systematic audit. Deployment confidence established. The 7-taskset structure with explicit gating (each taskset requires confirmation before execution) created an auditable trail of exactly what changed and why. This gives confidence that the codebase is in a known-good state for beta deployment.

Operational Takeaways

  1. Always audit before first deployment. Codebases that evolved through rapid prototyping accumulate contradictions invisible during development (where everything runs locally with the original developer’s env). A deployment attempt is the worst time to discover them.
  2. Auth migrations are cross-cutting. Switching auth providers (Supabase → BetterAuth) touched 23 files across SQL migrations, Python config, Docker compose, Kubernetes secrets, and TypeScript types. Budget auth changes as infrastructure work, not feature work.
  3. Config flags require wiring verification. The RBAC “disable” flag existed in config but was never wired to the enforcement points. Feature flags that aren’t integration-tested are worse than no flag at all — they create false confidence.
  4. Gated tasksets prevent blast radius. Breaking the cleanup into 7 numbered tasksets with explicit scope boundaries prevented cascading errors. Each taskset was independently verifiable before proceeding.
ServicePlatformNotes
Web Client (Next.js 14)VercelBetterAuth, Drizzle ORM
MCP Server (FastMCP)RailwayPort 8000
Engine (FastAPI)RailwayPort 8001
PostgreSQLNeon or Supabase DB-onlyShared migrations

Action Items

  • Deploy web client to Vercel (auth provider is now consistently BetterAuth)
  • Deploy MCP server + engine to Railway
  • Provision production PostgreSQL and run migrations 001-006
  • Set ENABLE_RBAC=false for beta (enable after role assignment UI ships)
  • Rewrite routes/webhooks.py from Supabase webhook handler to BetterAuth event hooks (deferred from this session)

Artifacts

  • Technical details: see cross-referenced so1-content finding
  • Production readiness plan: traceo-ai/traceo-mcp-server/TRACEO-PRODUCTION-READINESS-PLAN.md