All reportsStress Test Report
Gemma 4 12B IT · CUDA 12.8 up to 32 Concurrency
Extended sweep from 1 to 32 parallel requests to find the pre-saturation ceiling.
Gemma 4 12B IT · FP82026-06-11CUDA 12.8
Concurrency performance matrix
All times in milliseconds, throughput in tokens per second. Success rate is the percentage of requests that completed without error.
| Concurrency | Success | TTFT avg | P50 | P90 | P99 | TPS | ITL avg | P95 ITL |
|---|---|---|---|---|---|---|---|---|
| 1 | 100% | 64.2 | 62.1 | 68.9 | 74.5 | 38.2 | 25.7 | 26.8 |
| 2 | 100% | 65.2 | 65.2 | 66.0 | 66.1 | 38.3 | 25.6 | 26.6 |
| 4 | 100% | 81.3 | 83.4 | 97.0 | 97.1 | 38.1 | 25.8 | 26.6 |
| 8 | 100% | 101.6 | 105.0 | 105.8 | 106.2 | 37.7 | 26.0 | 26.9 |
| 12 | 100% | 105.1 | 107.3 | 108.1 | 108.2 | 37.5 | 26.2 | 27.5 |
| 16 | 100% | 109.0 | 118.2 | 119.2 | 119.6 | 37.5 | 26.2 | 27.1 |
| 18 | 100% | 135.9 | 137.4 | 138.6 | 138.7 | 38.6 | 25.4 | 26.5 |
| 20 | 100% | 130.9 | 138.2 | 139.4 | 140.1 | 38.0 | 25.8 | 26.8 |
| 22 | 100% | 150.4 | 151.5 | 152.8 | 153.1 | 38.1 | 25.7 | 26.9 |
| 24 | 100% | 158.7 | 159.5 | 161.2 | 161.6 | 38.0 | 25.8 | 26.9 |
| 26 | 100% | 158.8 | 163.6 | 164.7 | 165.8 | 38.4 | 25.5 | 26.5 |
| 28 | 100% | 179.6 | 180.8 | 181.7 | 182.3 | 38.0 | 25.8 | 27.2 |
| 30 | 100% | 161.6 | 167.4 | 169.3 | 169.8 | 37.4 | 26.2 | 27.2 |
| 32 | 100% | 166.5 | 172.7 | 174.1 | 175.2 | 37.5 | 26.1 | 27.2 |
Key takeaways
Safe OpenRouter concurrency limit of 32 · P99 TTFT under ~185 ms with no saturation and 100% success rate.
- Zero request failures. Every concurrency step completed with a 100% success rate; no OOM or service interruption was observed.
- Stable token generation. Inter-token latency and per-request throughput remained in a narrow band even as concurrency scaled.