Where the goblins came fromOpenAI News / Apr 29, 2026Nerdy reward amplified 'goblin' metaphorsStyle propagated via RL and SFTFixed by removing reward and filtering datareward-modellingrlhfsftdataset-filteringstyle-transfermodel-auditprompting
Improving instruction hierarchy in frontier LLMsOpenAI News / Mar 10, 2026IH‑Challenge公開安全性と注入耐性向上過剰拒否を回避instruction-hierarchyrlhfprompt-injectionsafetydatasetrobustness