Why Migrations Are Dangerous
Database migrations are the only deployment step that cannot be instantly rolled back. You can roll back application code in seconds with a container swap, but a migration that drops a column, truncates data, or acquires an exclusive lock can cause irreversible damage or hours of downtime.
The core problem: during deployment, the OLD version of your application and the NEW version must both work with the database schema simultaneously. If a migration adds a NOT NULL column, the old code crashes because it does not include that column in INSERT statements.
Zero-downtime migrations solve this with the Expand-Contract pattern: first, expand the schema to support both old and new code. Deploy the new code. Then, contract (clean up the old columns/tables). Never do both in a single deployment.
Key Takeaways
The Expand-Contract Pattern
The Expand-Contract (also called Parallel Change) pattern splits dangerous migrations into safe, incremental steps. Instead of renaming a column in one migration (which breaks old code instantly), you add a new column, sync data between both, deploy code that uses the new column, then drop the old column.
This pattern applies to: column renames, column type changes, table renames, NOT NULL additions, and any change that would break existing code.
-- EXAMPLE: Rename column "name" → "full_name" safely -- (Direct rename would break all existing code instantly!) -- Step 1: EXPAND — Add new column alongside old one ALTER TABLE users ADD COLUMN full_name TEXT; -- Backfill existing data UPDATE users SET full_name = name WHERE full_name IS NULL; -- Add trigger to keep both columns in sync during transition CREATE OR REPLACE FUNCTION sync_user_name() RETURNS TRIGGER AS $$ BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name = NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name = NEW.full_name; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_sync_name BEFORE UPDATE ON users FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Step 2: Deploy new application code that reads/writes "full_name" -- Old code still works because "name" column exists and syncs -- Step 3: CONTRACT — After confirming new code works, clean up DROP TRIGGER trg_sync_name ON users; DROP FUNCTION sync_user_name(); ALTER TABLE users DROP COLUMN name;
Key Takeaways
Safe Migration Operations
Some operations are always safe (additive changes), some are always dangerous (destructive changes), and some depend on the execution strategy. Knowing which is which prevents outages.
Always safe: ADD COLUMN without NOT NULL default, CREATE TABLE, CREATE INDEX CONCURRENTLY, ADD CONSTRAINT with NOT VALID.
Always dangerous: DROP COLUMN, DROP TABLE, RENAME COLUMN, ALTER TYPE on populated columns, ADD COLUMN with NOT NULL without default.
Conditionally safe: ADD INDEX (use CONCURRENTLY), ADD NOT NULL (add with default first, then apply constraint), ALTER TYPE (expand-contract).
-- ✅ SAFE: Add nullable column (no lock, instant) ALTER TABLE posts ADD COLUMN subtitle TEXT; -- ✅ SAFE: Add column with default (PG 11+ — instant, no rewrite) ALTER TABLE posts ADD COLUMN views INTEGER DEFAULT 0; -- ❌ DANGEROUS: Add NOT NULL without default (fails for existing rows) ALTER TABLE posts ADD COLUMN category TEXT NOT NULL; -- ✅ SAFE: Add NOT NULL in two steps ALTER TABLE posts ADD COLUMN category TEXT DEFAULT 'general'; -- After backfilling existing rows: ALTER TABLE posts ALTER COLUMN category SET NOT NULL; -- ✅ SAFE: Create index without blocking writes CREATE INDEX CONCURRENTLY idx_posts_category ON posts(category); -- ❌ DANGEROUS: Create index without CONCURRENTLY (locks table!) CREATE INDEX idx_posts_category ON posts(category); -- ✅ SAFE: Add foreign key without blocking writes ALTER TABLE posts ADD CONSTRAINT fk_author FOREIGN KEY (author_id) REFERENCES users(id) NOT VALID; -- Skip checking existing rows -- Validate in separate step (doesn't lock table) ALTER TABLE posts VALIDATE CONSTRAINT fk_author;
Migration Tools: Flyway vs Prisma Migrate
Flyway is the industry standard for SQL-based migrations. It uses versioned SQL files (V1__create_users.sql, V2__add_posts.sql) and tracks applied migrations in a metadata table. It is database-agnostic and works with any SQL language.
Prisma Migrate is the modern TypeScript-first alternative. You define your schema in Prisma's DSL, and it generates SQL migrations automatically. It provides type-safe database access through the Prisma Client. Better DX but less control over the SQL.
For Spring Boot applications (like InkAndHorizon), Flyway is the standard choice — it integrates directly with Spring Boot's startup lifecycle.
-- Flyway: V1__create_users_table.sql CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) NOT NULL UNIQUE, name VARCHAR(100) NOT NULL, role VARCHAR(20) DEFAULT 'user', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_users_role ON users(role); -- V2__add_posts_table.sql CREATE TABLE posts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title VARCHAR(200) NOT NULL, content TEXT NOT NULL, author_id UUID NOT NULL REFERENCES users(id), published BOOLEAN DEFAULT false, created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX CONCURRENTLY idx_posts_author ON posts(author_id); -- Flyway configuration (application.yml for Spring Boot) -- spring: -- flyway: -- enabled: true -- baseline-on-migrate: true -- locations: classpath:db/migration -- validate-on-migrate: true
Key Takeaways
Rollback Strategies
The best rollback strategy is: do not need one. If you follow expand-contract, every migration is additive and safe. Old code and new code both work with the expanded schema, so rolling back the application code requires no schema changes.
For emergencies, maintain a "down" migration for every "up" migration. The down migration should reverse the schema change. But in practice, down migrations are dangerous because they run in reverse order and may fail on real data.
The nuclear option: restore from a point-in-time backup. PostgreSQL's WAL-based PITR (Point-in-Time Recovery) can restore to any moment within your retention window. This is your last resort — test it regularly.
Key Takeaways
Key Takeaways
Zero-downtime database migrations are a discipline, not a tool. The expand-contract pattern works with any migration tool (Flyway, Prisma, Liquibase, Alembic) and any database. Master the pattern, and migrations become the safest part of your deployment.
The rules: never combine schema changes and code changes in one deployment, always use CREATE INDEX CONCURRENTLY, add NOT NULL columns in two steps (add with default, then set constraint), and test every migration in staging first.
For interviews: explain expand-contract, list safe vs dangerous operations, discuss Flyway vs Prisma trade-offs, and describe how you would rename a column on a table with 100M rows without downtime.