Gemini 3.1 Flash-Lite: High-Performance AI Model for Scale
Key Points
- 2.5X faster response time with 45% speed increase over previous model
- Ultra-low pricing at $0.25/1M input tokens for high-volume workloads
- Configurable thinking levels for adaptive reasoning control
Summary
Google has released Gemini 3.1 Flash-Lite in preview, a new AI model optimized for high-volume developer workloads. The model delivers enhanced performance at significantly reduced costs compared to larger models, making it ideal for scale applications requiring both speed and intelligence.
Key Points
- Cost-Efficient Pricing: $0.25/1M input tokens and $1.50/1M output tokens
- Performance Improvements: 2.5X faster Time to First Answer Token and 45% increase in output speed compared to 2.5 Flash
- High Quality Scores: Achieves 1432 Elo score on Arena.ai Leaderboard, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro
- Adaptive Intelligence: Features configurable "thinking levels" for task-specific reasoning control
- Use Cases: Translation, content moderation, UI generation, dashboard creation, and simulations
- Availability: Preview access via Gemini API in Google AI Studio and Vertex AI for enterprises
- Early Adoption: Companies like Latitude, Cartwheel, and Whering are already implementing the model