NorthwindNorthwind
← All posts

Our Journey to Zero-Downtime Deploys

Rolling deploys, health checks that actually check health, and the database migration rule that saved us.

Raj Patel, Tom Becker · 1 min read
Share

"Zero downtime" is mostly about ordering. Ship code that tolerates both the old and new database shape, then migrate.

The expand/contract rule

  1. Expand: add the new column, nullable. Deploy code that writes both.
  2. Migrate: backfill.
  3. Contract: stop writing the old column, then drop it — in a later deploy.
// During expand, read prefers new but falls back to old
const name = row.full_name ?? row.name;

Health checks

Our old /health returned 200 if the process was up. It lied. The new one checks the database and the queue before reporting ready, so the load balancer never routes to a pod that can't serve.

A health check that always returns 200 is just an uptime decoration.
Share

More to read

Related posts