Workers AI - Moonshot AI Kimi K2.6 now available on Workers AI
Key Points
- 1T MoE model with 32B active per token
- 262.1k token context window
- Swarm orchestration up to 300 sub-agents
Summary
Moonshot AI Kimi K2.6 is now available on Workers AI (Day 0 support from Moonshot AI). Kimi K2.6 is a native multimodal, agentic model built on a Mixture-of-Experts (MoE) architecture (1T total parameters, 32B active per token) designed for long-horizon coding, coding-driven design, proactive autonomous execution, swarm-based task orchestration, and multimodal (vision + text) workflows. It delivers efficient inference and competitive benchmark performance (e.g., BrowseComp 83.2, SWE-Bench Verified 80.2, Terminal-Bench 2.0 66.7).
Key Points
- Architecture: MoE with 1T total parameters and ~32B active parameters per token for efficient, frontier-scale inference.
- Massive context: 262.1k token context window for full conversation history, tool definitions, and codebases in long-running agent sessions.
- Long-horizon coding: Improved end-to-end coding across Rust, Go, Python and other languages.
- Coding-driven design: Converts prompts and visual inputs into production-ready interfaces and full-stack workflows.
- Agent orchestration: Swarm orchestration up to 300 sub-agents executing ~4,000 coordinated steps for complex autonomous tasks.
- Multimodal: Vision inputs supported alongside text; multi-turn tool calling and configurable "thinking" reasoning depth.
Migration notes
- API change: use chat_template_kwargs.thinking (replaces chat_template_kwargs.enable_thinking).
- Reasoning output: reasoning field replaces reasoning_content.
Getting started
- Workers AI binding: env.AI.run()
- REST API: POST /ai/run
- OpenAI-compatible: POST /v1/chat/completions
- AI Gateway can proxy any of the above endpoints. See the Kimi K2.6 model page and pricing for details.