Architecture • 2 min • 2026-04-18

Lessons from Scaling PostgreSQL to Millions of Requests

Real-world lessons on how PostgreSQL behaves under heavy load and what actually breaks first in production systems.

Intro

PostgreSQL is often underestimated in early-stage architectures. Many teams assume it will stop scaling at a certain point and rush toward sharding or NoSQL solutions prematurely.

In reality, PostgreSQL can handle significantly more load than expected — if the system around it is designed correctly.

Bottlenecks

Our scaling issues did not start with storage limits. They started with query design and access patterns:

inefficient indexes under mixed workloads
long-running transactions blocking critical paths
connection exhaustion under traffic spikes
read/write contention during peak usage

The database itself was not failing — it was being misused under load.

Migration strategy

Instead of immediately introducing sharding, we focused on:

query optimization and index tuning
separating read-heavy and write-heavy workloads
introducing read replicas for scaling reads
implementing connection pooling (PgBouncer)

This delayed the need for architectural complexity while solving real performance bottlenecks.

Event system

We also moved reporting and analytics workloads out of the transactional path using async processing.

Instead of blocking user operations:

User request → DB write → analytics query

We moved to:

User request → DB write → event → async analytics pipeline

This alone removed significant load from the primary database.

Infrastructure

Key improvements included:

strict connection pooling limits
caching layer for hot reads
query time monitoring
slow query logging with actionable alerts
separation of OLTP vs OLAP workloads

Results

query latency reduced significantly under peak load
DB CPU usage stabilized instead of spiking unpredictably
connection saturation issues disappeared
system became easier to reason about under stress

Lessons

The key insight:

Most “scaling problems” in databases are actually query design problems.

Sharding should be a last step, not a first reaction.