Helping ChatGPT better recognize context in sensitive conversations
Key Points
- safety summaries capture cross-conversation risk
- safe-response rates improved up to ~52%
- summaries are ephemeral, narrow, and expert-informed
Summary
OpenAI updated ChatGPT to better detect emerging safety risks that develop over the course of a conversation or across multiple conversations. The system now creates narrowly scoped, short "safety summaries" (generated by a safety-reasoning model) to capture prior safety-relevant context for a limited time and use them only when directly relevant to serious high-risk scenarios (suicide, self-harm, harm-to-others). The models were retrained and evaluated with input from mental-health experts; internal tests show substantial improvements in safe responses without meaningful degradation in everyday conversations.
Key Points
- Safety summaries: short, factual notes produced by a dedicated safety-reasoning model, retained only briefly and used solely for serious safety concerns (not long-term personalization).
- Behavioral change: models use earlier context to escalate caution, de-escalate, refuse harmful details, or redirect toward safer alternatives when risk signals accumulate.
- Expert-informed design: developed with psychiatrists and psychologists to set creation triggers, context window, and retention.
- Measured gains: single-conversation safe-response improved ~50% for suicide/self-harm and ~16% for harm-to-others; on GPT-5.5 Instant cross-conversation gains were ~39% (suicide/self-harm) and ~52% (harm-to-others).
- Safety-summary quality: >4,000 evaluations—safety relevance 4.93/5, factuality 4.34/5.
- Engineering implications: implement narrow scope and short retention for these summaries, add monitoring and evaluation across model updates, ensure privacy safeguards, and test that benign-user experience remains unaffected.