dev · 2026-05-01
Quantify monthly cost of cutting p99 latency from baseline target — compute scaling, caching, edge deployment, and faster instance tiers.
| Current p99 latency (ms) | 850 |
| Target p99 latency (ms) | 250 |
| Current infra spend / mo | $12,000 |
| Compute scaling factor % | 250% |
| Caching layer (Redis) $/mo | $800 |
| Edge deployment $/mo | $1,500 |
| Annual additional cost | $281,718 |
| p99 improvement % | 70.6% |
| Cost per 100ms improvement | $3,913 |
Cutting p99 latency from 850ms to 250ms is achievable but typically requires 2-3x more compute, plus caching, plus edge deployment. The math reveals whether the latency improvement is worth the spend.
To cut latency in half, the typical mix:
Cutting 850ms → 250ms (70% reduction) with 250% scaling factor + caching + edge:
That's $33 per 100ms reduction per month — only worth it if user-facing latency drives meaningful conversion or revenue.
RAIL model (Google): <100ms feels instant; 100-300ms feels responsive; 300ms-1s noticeable delay; >1s users start to disengage; >3s users leave. Optimize for under 300ms p99 for interactive experiences. Beyond that, diminishing returns.
p99. Median is OK in most well-architected systems; p99 is where users feel pain. A 250ms median with 5s p99 means 1% of users wait 5 seconds. That 1% churns. Optimize the worst-case, not the best.
Caching is almost always cheaper. A Redis cache at $800/mo can handle 80% of requests at sub-10ms latency, leaving compute to handle the slow 20%. Always evaluate caching before vertical/horizontal scaling — order-of-magnitude better $/ms-improvement.