Introducing GPT-5.4 mini and nano
Key Points
- GPT‑5.4 mini: fast, near GPT‑5.4 accuracy
- GPT‑5.4 nano: lowest-cost, low-latency for subagents
- mini in API/Codex/ChatGPT; nano API-only
Summary
OpenAI released GPT-5.4 mini and GPT-5.4 nano: smaller, faster variants of GPT-5.4 optimized for low-latency, high-throughput workloads. GPT-5.4 mini brings many GPT-5.4 strengths (coding, reasoning, multimodal understanding, and tool use) while running >2× faster than GPT-5 mini and approaching GPT-5.4 performance on several benchmarks. GPT-5.4 nano is the smallest, cheapest option for speed- and cost-sensitive tasks such as classification, extraction, ranking, and simple coding subagents.
Key Points
- Performance: GPT-5.4 mini outperforms GPT-5 mini across coding, reasoning, multimodal, and tool benchmarks and approaches GPT-5.4 on SWE-Bench Pro, OSWorld-Verified, and other evaluations.
- Latency & cost: mini is optimized for responsive experiences (>2× faster than GPT-5 mini); nano targets the lowest latency and cost for high-volume workloads.
- Recommended use cases:
- mini: responsive coding assistants, multimodal computer-use tasks (interpreting screenshots), and subagents that require stronger reasoning at low latency.
- nano: classification, data extraction, ranking, and simple subagents where throughput and cost matter most.
- Features & limits: GPT-5.4 mini supports text+image inputs, tool use, function calling, web/file search, computer use, and skills; it has a 400k token context window.
- Availability & pricing:
- GPT-5.4 mini: available in API, Codex, and ChatGPT (Thinking). Pricing: $0.75 per 1M input tokens, $4.50 per 1M output tokens. In Codex, it consumes ~30% of GPT-5.4 quota for cheaper per-task runs.
- GPT-5.4 nano: API-only. Pricing: $0.20 per 1M input tokens, $1.25 per 1M output tokens.
- Integration tip: compose systems where a larger model handles planning/coordination and offloads parallel, lower-reasoning subtasks to mini/nano subagents to reduce cost and improve throughput.
- Safeguards: refer to the System Card addendum on the Deployment Safety Hub for model safety details.
Actionable guidance
- Choose GPT-5.4 mini when you need a strong balance of capability and low latency for coding or multimodal tasks.
- Choose GPT-5.4 nano when minimizing cost/latency for simple, parallelizable subagent workloads.
- Benchmark in your environment: latency and cost estimates depend on real-world tool call duration, tokens, and runtime factors that may differ from the published simulations.