An update on our election safeguards
Key Points
- Bias evals: Opus 4.7 95%, Sonnet 4.6 96%
- Policy compliance: Opus 4.7 100%, Sonnet 4.6 99.8%
- Web-search trigger: Opus 4.7 92%, Sonnet 4.6 95%
Summary
Claude's team updated technical safeguards and evaluations ahead of the 2026 US midterms and other elections. They measure political bias, enforce election-oriented usage policies, run automated and simulated-abuse tests (including influence-operation simulations and autonomous campaign planning), and surface verified voter resources via election banners and web search. Results and evaluation assets are published and external reviewers are engaged.
Key Points
- Bias and neutrality: character training + system prompts enforce impartial responses; open evaluation dataset and methodology published for reproducibility.
- Bias scores: Opus 4.7 = 95%, Sonnet 4.6 = 96% on political-engagement neutrality benchmarks.
- Policy compliance tests: 600 prompts (300 harmful, 300 legitimate); Opus 4.7 correctly complied/declined 100%, Sonnet 4.6 99.8%.
- Influence-operation resilience: multi-turn simulations showed Opus 4.7 94% and Sonnet 4.6 90% appropriate refusals; autonomous end-to-end campaign tests largely refused with safeguards enabled.
- Detection & enforcement: automated classifiers + dedicated threat intelligence team for always-on monitoring and disruption of coordinated abuse.
- Up-to-date information: election banners (e.g., TurboVote for US midterms) and web-search triggering validate current info—web-search triggered in 92% (Opus 4.7) and 95% (Sonnet 4.6) of test prompts.
- External review & transparency: working with academic and industry groups for broader behavioral review; evaluation code and datasets open-sourced.
Engineering implications
- Continue running the published evaluation suites during model development and deployment cycles.
- Maintain and tune system prompts and classifier thresholds; monitor metrics for drift (bias, compliance, web-search triggering).
- Keep logging and simulation pipelines for multi-turn abuse scenarios and autonomous-capability assessments.
- Use election banners and verified data sources when surfacing voter-related information; ensure web-search integration is triggered for recent events.