← All posts
Our Journey to Zero-Downtime Deploys
Rolling deploys, health checks that actually check health, and the database migration rule that saved us.
"Zero downtime" is mostly about ordering. Ship code that tolerates both the old and new database shape, then migrate.
The expand/contract rule
- Expand: add the new column, nullable. Deploy code that writes both.
- Migrate: backfill.
- Contract: stop writing the old column, then drop it — in a later deploy.
// During expand, read prefers new but falls back to old
const name = row.full_name ?? row.name;Health checks
Our old /health returned 200 if the process was up. It lied. The new one checks the database and the queue before reporting ready, so the load balancer never routes to a pod that can't serve.
A health check that always returns 200 is just an uptime decoration.
More to read