Google Gemma 4 26B A4B Model Now Available on Cloudflare Workers AI
Key Points
- Mixture-of-Experts model with 26B parameters, only 4B active per inference
- 256,000 token context window with built-in reasoning capabilities
- Vision understanding and function calling with multilingual support
Summary
Cloudflare has partnered with Google to bring the Gemma 4 26B A4B model to Workers AI. This Mixture-of-Experts (MoE) model delivers the performance of a 26B parameter model while only activating 4B parameters per forward pass, providing high-quality results with improved efficiency.
Key Points
- Mixture-of-Experts Architecture: 8 active experts out of 128 total (plus 1 shared expert) for frontier-level performance at reduced compute cost
- Extended Context Window: 256,000 token context for long conversations, tool definitions, and document processing
- Built-in Reasoning: Thinking mode enables step-by-step reasoning for improved accuracy on complex tasks
- Vision Capabilities: Object detection, document/PDF parsing, OCR, handwriting recognition with variable aspect ratios
- Function Calling: Native structured tool support for agentic workflows and multi-step planning
- Multilingual Support: Out-of-the-box support for 35+ languages, pre-trained on 140+ languages
- Code Generation: Comprehensive coding capabilities including generation, completion, and correction
Access Methods
- Workers AI binding (
env.AI.run()) - REST API endpoints (
/runor/v1/chat/completions) - OpenAI-compatible endpoint