The Performance Diagnosis Framework
Before optimizing anything, you must diagnose. PostgreSQL performance issues fall into four categories: slow queries (missing indexes, bad joins), configuration problems (default settings are for development, not production), connection exhaustion (too many clients, no pooler), and hardware bottlenecks (I/O, memory, CPU).
The diagnosis order: (1) Run pg_stat_statements to find your slowest queries, (2) EXPLAIN ANALYZE those queries to understand execution plans, (3) Add indexes based on the plan analysis, (4) Tune postgresql.conf for your workload.
PostgreSQL 17 adds several performance features: incremental sorting (reuses partial sort orders), enhanced parallel query execution, faster JSONB operations, and improved logical replication.
Key Takeaways
EXPLAIN ANALYZE: Reading Execution Plans
EXPLAIN ANALYZE runs the query and shows both the planned and actual execution strategy. The key things to look for: Sequential Scans on large tables (missing index), Nested Loops on large result sets (consider Hash Join), high "actual rows" vs "rows estimated" (stale statistics), and "Sort Method: external merge" (insufficient work_mem).
The cost numbers (e.g., cost=0.00..123.45) represent arbitrary units — lower is better. The actual time in milliseconds is what matters for real performance.
-- Enable pg_stat_statements extension CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Find top 10 slowest queries by total execution time SELECT calls, round(total_exec_time::numeric, 2) AS total_ms, round(mean_exec_time::numeric, 2) AS avg_ms, round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct, substring(query, 1, 100) AS query_preview FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; -- EXPLAIN ANALYZE: See actual execution plan EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.name, COUNT(p.id) AS post_count FROM users u LEFT JOIN posts p ON p.author_id = u.id WHERE u.created_at > '2026-01-01' GROUP BY u.id ORDER BY post_count DESC LIMIT 20; -- Sample output: -- Limit (actual time=245.3..245.4 rows=20) -- -> Sort (actual time=245.3..245.3 rows=20) -- Sort Key: (count(p.id)) DESC -- -> HashAggregate (actual time=243.1..244.2 rows=5000) -- -> Hash Left Join (actual time=12.5..180.3 rows=45000) -- -> Seq Scan on users u (actual time=0.1..8.2 rows=5000) ← Needs index! -- Filter: (created_at > '2026-01-01') -- Rows Removed by Filter: 95000 -- -> Hash (actual time=3.2..3.2 rows=45000) -- -> Seq Scan on posts p (actual time=0.1..2.1 rows=45000)
Indexing Strategies: B-Tree, GIN, BRIN & Partial
Indexes are the most powerful performance tool, but they are not free. Each index slows down INSERT/UPDATE/DELETE operations and consumes disk space. The goal is the minimum set of indexes that covers your critical query patterns.
B-Tree (default): best for equality and range queries (WHERE email = ?, WHERE age > 30). GIN: best for full-text search, JSONB containment, and array overlap. BRIN: best for naturally ordered large tables (time-series data). Partial: indexes only rows matching a condition (WHERE deleted = false).
PostgreSQL 17 improves multi-column B-Tree index utilization with incremental sorting — it can reuse a partial sort order from a prefix of a multi-column index.
-- B-Tree: Equality and range queries (most common) CREATE INDEX CONCURRENTLY idx_users_email ON users(email); CREATE INDEX CONCURRENTLY idx_posts_created ON posts(created_at DESC); -- Composite index: WHERE author_id = ? AND published = true ORDER BY created_at CREATE INDEX CONCURRENTLY idx_posts_author_published ON posts(author_id, published, created_at DESC); -- Partial index: Only index active users (skips 90% of rows) CREATE INDEX CONCURRENTLY idx_active_users ON users(email) WHERE deleted_at IS NULL; -- GIN: Full-text search CREATE INDEX CONCURRENTLY idx_posts_search ON posts USING gin(to_tsvector('english', title || ' ' || content)); -- GIN: JSONB containment (@> operator) CREATE INDEX CONCURRENTLY idx_posts_tags ON posts USING gin(tags); -- Enables: WHERE tags @> '["react", "performance"]' -- BRIN: Time-series data (very compact, good for 100M+ row tables) CREATE INDEX CONCURRENTLY idx_logs_created ON audit_logs USING brin(created_at) WITH (pages_per_range = 32); -- Covering index: includes extra columns to avoid table lookup CREATE INDEX CONCURRENTLY idx_users_email_covering ON users(email) INCLUDE (name, role); -- SELECT name, role FROM users WHERE email = ? → index-only scan!
Key Takeaways
Query Optimization Techniques
Beyond indexes, there are query-level optimizations that can 10x performance. The most impactful: avoid SELECT * (only select columns you need), use EXISTS instead of IN for subqueries, avoid functions on indexed columns (WHERE LOWER(email) kills your index), and use CTEs only when needed (materialized CTEs prevent optimization).
For pagination, use keyset pagination (WHERE id > last_id) instead of OFFSET. OFFSET scans and discards rows — OFFSET 100000 means scanning 100,000 rows just to skip them. Keyset pagination is O(1) regardless of page depth.
-- ❌ BAD: OFFSET pagination (slow for deep pages) SELECT * FROM posts ORDER BY created_at DESC OFFSET 100000 LIMIT 20; -- Scans 100,020 rows to return 20! -- ✅ GOOD: Keyset pagination (constant speed on any page) SELECT id, title, created_at FROM posts WHERE created_at < '2026-03-15T10:30:00Z' -- Last item from previous page ORDER BY created_at DESC LIMIT 20; -- ❌ BAD: Function on indexed column (kills the index) SELECT * FROM users WHERE LOWER(email) = 'ashutosh@ih.com'; -- ✅ GOOD: Use a functional index if you must CREATE INDEX idx_users_email_lower ON users(LOWER(email)); -- ❌ BAD: SELECT * (fetches all columns including large text/jsonb) SELECT * FROM posts WHERE author_id = 123; -- ✅ GOOD: Select only needed columns SELECT id, title, excerpt, created_at FROM posts WHERE author_id = 123; -- ❌ BAD: IN with large subquery SELECT * FROM orders WHERE user_id IN (SELECT id FROM users WHERE active = true); -- ✅ GOOD: EXISTS (stops at first match) SELECT * FROM orders o WHERE EXISTS (SELECT 1 FROM users u WHERE u.id = o.user_id AND u.active = true); -- ✅ Batch operations: INSERT multiple rows in one statement INSERT INTO posts (title, content, author_id) VALUES ('Post 1', 'Content 1', 1), ('Post 2', 'Content 2', 1), ('Post 3', 'Content 3', 2); -- 1 network roundtrip instead of 3
Key Takeaways
Connection Pooling with PgBouncer
Each PostgreSQL connection consumes ~10MB of memory. The default max_connections is 100, meaning ~1GB of RAM just for connections. In a high-traffic web app with 1000 concurrent users, you will hit "too many connects" errors instantly.
PgBouncer sits between your application and PostgreSQL, multiplexing hundreds of application connections into a small pool of actual database connections. Mode "transaction" is recommended: connections are returned to the pool after each transaction, not after each query.
Supabase, Railway, and Neon all include built-in connection poolers. If you self-host PostgreSQL, PgBouncer or pgcat is essential.
Key Takeaways
Production Configuration (postgresql.conf)
PostgreSQL ships with conservative defaults optimized for a laptop. Production servers need tuning based on available RAM, CPU cores, and workload type (OLTP vs analytics).
The most impactful settings: shared_buffers (25% of RAM), effective_cache_size (75% of RAM), work_mem (per-operation sort/hash memory), maintenance_work_mem (for vacuum and index creation), and wal_buffers (for write-heavy workloads).
# postgresql.conf — Production tuning for 16GB RAM, 4 CPU cores # Memory shared_buffers = 4GB # 25% of RAM effective_cache_size = 12GB # 75% of RAM (includes OS cache) work_mem = 64MB # Per-operation sort/hash memory maintenance_work_mem = 1GB # For VACUUM and CREATE INDEX # WAL (Write-Ahead Log) wal_buffers = 64MB max_wal_size = 4GB min_wal_size = 1GB checkpoint_completion_target = 0.9 # Query planner random_page_cost = 1.1 # SSD (default 4.0 is for spinning disks!) effective_io_concurrency = 200 # SSD parallel reads # Parallelism max_worker_processes = 4 max_parallel_workers_per_gather = 2 max_parallel_workers = 4 # Autovacuum (prevent table bloat) autovacuum_vacuum_scale_factor = 0.05 # Vacuum more frequently autovacuum_analyze_scale_factor = 0.02 # Update stats more frequently # Logging (for pg_stat_statements) shared_preload_libraries = 'pg_stat_statements' log_min_duration_statement = 200 # Log queries slower than 200ms
Key Takeaways
PostgreSQL performance tuning follows a clear path: diagnose with pg_stat_statements, analyze with EXPLAIN ANALYZE, fix with indexes, optimize queries, and tune configuration. Always measure before and after — gut feelings are wrong.
The highest-impact techniques: keyset pagination (not OFFSET), partial indexes (only index what you query), covering indexes (avoid table lookups), connection pooling (PgBouncer), and production-appropriate postgresql.conf settings.
For interviews: explain B-Tree vs GIN vs BRIN index selection, demonstrate EXPLAIN ANALYZE reading, discuss keyset vs offset pagination, and know the top postgresql.conf settings and why they matter.