A shared playbook for trustworthy third party evaluationsOpenAI News / May 29, 2026ハーネスを明示主張と証拠を一致脆弱性を予算で検証evaluationharnesssafeguardstool-usecontaminationbudgetablation
Project Glasswing: what Mythos showed usCloudflare / May 18, 2026Mythos chains primitives into full PoCsModel refusals are inconsistent — add safeguardsHarnessing narrow parallel agents reduces noisemythosllm-securityvulnerability-researchpoe-proofexploit-chainingharness