Moonshot AI Kimi K2.5 Model Now Available on Cloudflare Workers AI
Key Points
- First frontier-scale open-source model on Workers AI platform
- 256k context window with multi-turn tool calling and vision support
- New asynchronous batch API with pull-based processing system
Summary
Cloudflare Workers AI now supports Moonshot AI's Kimi K2.5 (@cf/moonshotai/kimi-k2.5), marking the first frontier-scale open-source model on their AI inference platform. This large language model offers enterprise-grade capabilities including a 256k context window, multi-turn tool calling, vision inputs, and structured outputs.
Key Features
- 256,000 token context window - Retains full conversation history, tool definitions, and entire codebases
- Multi-turn tool calling - Enables complex agent workflows across conversation turns
- Vision inputs - Processes images alongside text
- Structured outputs - JSON mode and JSON Schema support for reliable parsing
- Function calling - Integration with external tools and APIs
Performance Optimizations
- Prefix caching - Avoids reprocessing shared context, improving Time to First Token (TTFT) and Tokens Per Second (TPS)
- Session affinity - Maintains context across requests using
x-session-affinityheader - Discounted pricing for cached tokens compared to input tokens
New Asynchronous API
- Redesigned pull-based batch processing system
- Handles high-volume requests that exceed synchronous rate limits
- Typical execution within 5 minutes during internal testing
- Ideal for non-real-time use cases like code scanning or research agents
- Use
queueRequest: trueparameter to queue batch requests