Queue throughput capacity calculator

About this calculator

Queue capacity — Little's Law applied

Workers handle messages serially in their main loop. Per-worker throughput = 1000 ms/sec ÷ avg processing time. Total throughput = workers × per-worker throughput. Reverse the equation to size your fleet.

The math

per-worker throughput at 100% util = 1000 ÷ avg ms per message
target throughput = 100% throughput × utilization target
workers needed = peak msg/sec ÷ target per-worker throughput

Default scenario: 250 msg/sec peak, 180ms avg, 70% target util:

Per-worker at 100%: 5.56 msg/sec
Per-worker at 70%: 3.89 msg/sec
Workers needed: 65 for steady-state at peak

Why 70% utilization, not 100%

Above 80% utilization, queues grow during normal variance (worker GC pauses, slow upstream, brief network blips). Above 90%, single-worker failure cascades to backlog. 70% target absorbs typical operational variance.

What this calc misses

Burstiness: peak msg/sec is the WRONG number if peaks are bursty. Use 99th percentile of 5-minute averages, not absolute peak.
Worker startup time: autoscaling reactive; if workers take 90s to start, peak is over before scaling helps. Pre-warm or scale on lagging indicator (queue depth).
Message size: large messages pull through bandwidth, not just CPU. Verify network throughput.
Dependency latency: if processing calls a 200ms downstream API, the bottleneck is the API, not your workers.

Autoscaling triggers

Queue depth > N: scale up. Threshold = (workers × N seconds) — protects against backlog growth
Avg latency > 2x normal: scale up immediately
Queue depth = 0 for 10 minutes: scale down to floor

Default: scale up at 100 messages/queue depth × current workers, scale down to 70% of peak after 30 minutes idle.

FAQ

What's a realistic worker utilization target?

60-75% for batch workloads. 50-65% for latency-sensitive. 80%+ only if you have aggressive autoscaling that adds capacity in <60 seconds. Above 80% sustained = fragile system.

Should I use one fat worker or many thin workers?

Many thin workers: better fault isolation (1 dies, others handle traffic), faster autoscaling, smaller blast radius. Single fat worker: simpler ops, lower per-instance overhead, less coordination cost. Default many-thin unless you have specific reason (GPU, large in-memory state) for fat.

How do I handle priority queues?

Two patterns: (1) separate queue per priority with dedicated worker pool, (2) single queue with priority field + workers that prefer high-priority. Pattern 1 is simpler, scales independently, easier to reason about. Pattern 2 is more efficient but harder to operate. Most teams pattern 1.

Peak messages / second	250
Avg processing time (ms)	180
Target worker utilization %	70%
Worker $/month	$65

Monthly cost	$4,225
Messages/worker/sec at target util	3.9
Workers needed at 100% target	45