When 10,000 Users Rush the Exits — Understanding the Thunder Herd Problem
10,000 people are inside a concert venue. While the show is on (cache is warm), the exits are irrelevant — nobody needs them. The moment the show ends (cache TTL fires), all 10,000 people head for the same 3 exits simultaneously. The exits get crushed. The venue wasn't the problem. The synchronization was.
The Thunder Herd Problem is a distributed systems failure mode where a large number of processes or threads simultaneously attempt to access a shared, constrained resource — overwhelming it to the point of collapse.
In the context of web systems, it most commonly appears as a Cache Stampede: when a cached value expires (or is evicted), every process that was relying on that cache simultaneously discovers the miss and independently races to regenerate the data — usually from a database or expensive computation.
The critical word is simultaneously. If 1,000 requests hit your application per second and your cache misses, those 1,000 requests don't politely queue up. They all fire the same expensive DB query at the same time. Your database — which was comfortably handling 10 queries/second — suddenly receives 1,000. It crumbles.
The Thunder Herd most often strikes your most popular, most critical data — the homepage feed, the trending products, the live scoreboard. The more important the data, the more processes are waiting for it, and the harder the stampede.
Why does it happen?
1. TTL-Based Expiry Is Binary
When a cache key's TTL (Time-To-Live) reaches zero, it is deleted atomically and instantly. At T=299.999s, 1,000 requests get a cache hit. At T=300.001s, every single one of those same requests gets a cache miss. There is no gradual transition — it's a cliff edge.
2. Processes Are Stateless and Uncoordinated
Each web server or application process is unaware of what other processes are doing. When process #1 detects a cache miss, it has no way to know that processes #2 through #1000 are about to make the exact same discovery. Each process independently concludes: "I need this data, the cache doesn't have it, I'll fetch it from the DB." This is the correct, rational decision for a single process — it's catastrophic at scale.
3. Cache Regeneration Is Expensive
The reason data is cached in the first place is because fetching it is expensive — slow SQL joins, complex aggregations, calls to multiple services, heavy computation. If regenerating the cache took 1ms, the problem would be trivial. It's precisely because it takes 500ms–5s that the simultaneous stampede causes sustained DB overload.
Once the database becomes overloaded, response times increase. This means the cache regeneration takes even longer. More requests pile up waiting. Some requests time out and retry — adding even more load. The retry storm makes the original stampede worse, not better. Systems can oscillate between overload and false-recovery for minutes.
Real-world triggers
E-Commerce: Flash Sale Launch
A retailer caches the product catalogue with a 10-minute TTL. The flash sale is announced to start at 12:00:00. At 11:59:58, the cache entry expires (it was set 10 minutes ago when the last person loaded the page). At exactly 12:00:00, 50,000 users click "Shop Now". All 50,000 get a cache miss. The product database receives 50,000 simultaneous queries. The sale crashes before it starts.
Social Media: Viral Content
A tweet goes viral and is shared 200,000 times in 10 minutes. The tweet data is cached with a 5-minute TTL. When the TTL fires, 40,000 concurrent viewers all miss the cache and race to the database. Twitter famously suffered "Fail Whale" incidents that were partially caused by this pattern.
Scheduled Deployments / Cache Flushes
An engineering team deploys a new version of the application at 2AM to minimize user impact. The deployment process clears the entire Redis cache (to prevent serving stale data). At 9AM when office workers arrive and traffic ramps up, every single request is a cache miss. The database — designed to serve a warm-cache system — gets hit with full load from a cold start simultaneously. This is a "slow-burn stampede" that can take hours to recover.
Why it’s dangerous in Distributed Systems
In a distributed setup, this is rarely isolated. It causes Cascading Failures:
Database Overload: The DB stops responding.
App Server Exhaustion: Your App servers' worker threads are all blocked waiting for the DB. They can’t even handle other requests that have nothing to do with the cache.
The Retry Storm: Users get frustrated and hit "Refresh." This sends more requests into the already dying system.
The "Death Spiral": The Load Balancer thinks the servers are dead and removes them from the pool. The remaining servers get even more traffic and die faster.
Normal Spike vs. Thundering Herd
It's important to know the difference for interviews:
Normal Spike: You get more users (e.g., it’s 7 PM and people are logging off work). Your system load increases gradually as user count grows. You scale by adding more servers.
Thundering Herd: Your user count might be stable, but a single internal event (like a cache expiry) triggers a massive internal load spike. Scaling doesn't always help because the bottleneck is the shared resource (the DB), not the App servers.
How to prevent this?
Mutex / Distributed Lock
The "only one recomputes" strategy
When a cache miss is detected, instead of every process immediately querying the database, they first compete to acquire a distributed lock. Only the winner recomputes the data. All losers either wait for the winner to populate the cache, or they serve stale data in the meantime.
TTL Jitter / Randomization
The "spread the expiry" strategy
The Thunder Herd is caused by synchronized expiry. If 10,000 cache keys all expire at exactly T + 300s, you get 10,000 simultaneous misses. Add random noise to the TTL so keys expire at different times, spreading the recomputation load across time.
XFetch — Probabilistic Early Expiry
The "proactive recomputation" strategy
Instead of waiting for the cache to expire and then stampeding, individual requests proactively decide to recompute the cache while it's still valid — with a probability that increases as expiry approaches. One lucky request refreshes the cache before the TTL fires. The herd never forms.
Background Refresh / Async Warm
The "never let it expire" strategy
Cache entries never truly expire from the application's perspective. A background process continuously refreshes popular keys before they become stale. When a request comes in, it always gets the last-known value immediately — even if it's slightly stale. Recomputation is entirely decoupled from request handling.
How to Detect a Stampede in Production
Thunder Herd incidents have a very recognizable signature in your metrics. Here's what to look for:
Metric Signatures
1. DB CONNECTION SPIKE WITH UNIFORM QUERY PATTERN
Your DB monitoring shows a sudden vertical spike in active connections — and the "top queries" view shows hundreds of instances of the exact same query. In a normal DB overload, you'd see many different queries. In a stampede, you see the same one hundreds of times.
2. CACHE MISS RATE SPIKE PRECEDING DB SPIKE
In your Redis metrics, the keyspace_misses counter (or your app-level cache miss rate) will show a sudden spike exactly at the moment of the DB overload. The miss spike precedes the DB overload by a few milliseconds — that sequence is the smoking gun.
3. SAWTOOTH RECOVERY PATTERN
DB load spikes to 100%, partially recovers as some queries succeed and populate the cache, then spikes again as the accumulated request backlog causes secondary misses. This "sawtooth" pattern on your DB CPU/connections dashboard is almost uniquely characteristic of a stampede.
Conclusion: It’s a Coordination Problem, Not a Capacity Problem
If there’s one thing you should take away from this, it’s that scaling up your database is rarely the answer to a Thundering Herd. You could have the most powerful Postgres instance in the world, and a perfectly synchronized stampede of 100,000 requests will still bring it to its knees.