Transparency
Stress Test Reports
Raw concurrency sweeps run against single-instance nodes. We publish the numbers we measured — including the ones that show where the hardware stops being comfortable.
Gemma 4 12B IT · CUDA 12.8 Baseline
Warm-cache concurrency sweep from 1 to 16 parallel requests.
Gemma 4 12B IT · CUDA 12.8 up to 32 Concurrency
Extended sweep from 1 to 32 parallel requests to find the pre-saturation ceiling.
Gemma 4 12B IT · CUDA 12.8 up to 64 Concurrency
Full-range sweep from 1 to 64 parallel requests to measure saturated behavior.
Gemma 4 26B MoE IT · CUDA 13.0
MoE FP8 node sweep from 32 to 64 concurrency with prefix caching and CPU KV offloading.