A shared playbook for trustworthy third party evaluationsOpenAI News / May 29, 2026ハーネスを明示主張と証拠を一致脆弱性を予算で検証evaluationharnesssafeguardstool-usecontaminationbudgetablation