Partnering with Mozilla to improve Firefox’s security
Key Points
- 22 vulnerabilities discovered by Claude Opus 4.6
- 14 high-severity Firefox CVEs assigned
- task verifiers improved triage and patch confidence
Summary
Anthropic collaborated with Mozilla using Claude Opus 4.6 to automatically discover and report security issues in Firefox. Over two weeks Claude found 22 unique vulnerabilities (14 assigned high-severity by Mozilla), and the team submitted 112 reports after scanning ~6,000 C++ files. Most fixes shipped in Firefox 148.0. Engineers should treat this as a practical demonstration that modern LLMs can accelerate vulnerability discovery far faster than manual processes, while exploitation remains harder but possible in limited test environments.
Key Points
-
Findings and scale
- Claude Opus 4.6 discovered 22 novel Firefox vulnerabilities in February 2026; Mozilla classified 14 as high-severity.
- The effort produced 112 submitted reports after scanning ~6,000 C++ files; most issues fixed in Firefox 148.0.
-
Capabilities and limits
- The model is substantially better at finding bugs than at turning them into working exploits; only 2 automated exploits succeeded in a reduced-security testbed.
- Discovery is cheaper and faster than exploitation, but automated exploit development is an emerging risk.
-
Practical workflow and tooling
- Use "task verifiers": automated tests that confirm a vulnerability trigger is reproducible and that candidate patches remove the issue without regressing functionality.
- Provide maintainers with minimal test cases, clear proofs-of-concept, and candidate patches to speed triage and remediation.
- Maintain responsible disclosure practices (Coordinated Vulnerability Disclosure) and coordinate with maintainers to avoid false positives and overload.
Recommendations for engineers
- Integrate task verifiers into any LLM-driven security workflow: require reproducible triggers and regression tests alongside generated patches.
- Always include minimal, reproducible test cases, an actionable proof-of-concept, and a proposed patch when submitting AI-generated reports.
- Treat AI-authored patches with normal code review rigor; use CI/test suites to validate fixes before merge.
- Use this window to harden defense-in-depth (sandboxing, mitigations) because discovery capabilities are advancing rapidly.
Conclusion
LLMs like Opus 4.6 are already effective vulnerability researchers and can accelerate triage and remediation when paired with verifiers and close collaboration with maintainers. The community should adopt verification-first processes and strengthen mitigations while disclosure and patching workflows evolve.