Multi-tenant RLS: the illusion vs the reality
What happened
KAHN Cloud’s schema had Row-Level Security policies on every multi-tenant table. Policies referenced a per-request session variable. Handlers set the variable before every query. Tests passed. Schema review passed. On the day of rollout, a cross-tenant probe returned every tenant’s rows.Why
Railway’s default Postgres role has bothrolsuper=t and
rolbypassrls=t. Superusers and BYPASSRLS roles skip every RLS
policy, unconditionally. The policies we’d written were inert
for the entire life of the service — defense in depth on paper,
but the “app” was also the “superuser”. Even FORCE ROW LEVEL SECURITY doesn’t help — that clause binds table owners, not
superusers. The only fix is to connect as a role that isn’t
either.
The fix (applied same-session)
Three migrations plus an env rotation:FORCE ROW LEVEL SECURITYon all multi-tenant tables (closes the owner-gap for future role changes).- Create a dedicated app role with
NOSUPERUSER NOBYPASSRLS, grant minimum needed privileges, extend every policy withWITH CHECK(required for non-superuser INSERT). - Expose the pre-tenant-context
api_keysandtenantslookups as narrowSECURITY DEFINERfunctions — the one intentional cross-tenant hole. - Rotate
DATABASE_URLon the app service to the new credentials.
The systemic lesson
“Does RLS work?” is two questions, not one:- Are the policies correct? — what schema review catches.
- Is the app’s DB role one that obeys policies? — not caught by schema review, tests, or code review. Only a cross-tenant probe executed as the app role catches it.
What we institutionalised
- A doctrine that names the trap and the fix verbatim.
- A SECURITY DEFINER doctrine with the four-point audit checklist for every narrow cross-tenant hole.
- A production-rollout prompt-op whose Taskset 19 exit criteria embed the cross-tenant probe as non-skippable.