Building the foundation for running extra-large language models
Cloudflare / Apr 16, 2026
- Prefill–decode disaggregation cut tail latencies and improved intertoken throughput
- x-session-affinity prompt caching raised cache hit ratio from ~60% to ~80%
- Infire: multi-GPU support, lower memory overhead, and <20s cold starts