Skip to main content
Railway’s olly-volume hit 97% capacity — a 4.4GB LLM model in a 5GB volume. Instead of upgrading, we applied a four-layer optimisation: model swap (7B→3B, saving 2.4GB), trace rotation (7-day auto-prune with SQLite VACUUM), Docker hardening (.dockerignore dropped build context from 4.5GB to 232KB), and bot traffic filtering. Key learning: Resource constraints are architectural forcing functions. The 3B model is adequate for code tasks, the trace rotation prevents unbounded growth, and the .dockerignore should have existed from day one. The “crisis” produced a healthier service than the unconstrained version. Applicable pattern: When hitting platform limits, audit from largest consumer down. Model files, build contexts, and unrotated databases are the usual suspects. Fix all layers — the compound effect matters more than any single optimisation.