How ChatGPT learns about the world while protecting privacy
Key Points
- Privacy Filter masks personal data
- Users can opt out of model training
- Temporary Chats retained 30 days
Summary
OpenAI explains how ChatGPT is trained from public sources, partnerships, and user-provided content while applying automated safeguards to reduce personal data exposure. Engineers should understand the Privacy Filter, stage-wise filtering before training, and user-facing controls that let people opt out of model-improving data collection or use temporary, non-retained chats.
Key Points
- Data sources: public web content, partner datasets, and user/contractor/researcher-provided content; public content must be freely accessible to be used.
- Automated safeguards: OpenAI Privacy Filter identifies and masks personal information and is applied at multiple stages of training; an internal version is used on datasets and opted-in user conversations; the filter is also available to external developers.
- User controls: users can disable “Improve the model for everyone” (Settings → Data Controls) to stop new conversations from being used for training; Temporary Chats do not appear in history, do not create memories, are not used for training, and are retained for 30 days for safety then deleted; Memory is optional and editable/deletable.
- Operational guidance: do not include sensitive personal data in prompts; use Temporary Chats for sensitive debugging or testing; rely on export/delete/privacy-portal flows to handle data subject requests.
- Safety and remediation: ChatGPT is designed to refuse requests for private information but can make mistakes; a privacy request portal exists for reporting and remediation.
Actions for engineers
- To opt out of training data collection: Settings → Data Controls → turn off “Improve the model for everyone.”
- For sensitive work or tests: start a Temporary Chat (click “Temporary” in a new chat) so content is not retained in history or used to improve models.
- If PII appears in outputs: submit a request through the privacy request portal; consider sanitizing prompts and logs before storage or analysis.
Key takeaway
OpenAI combines automated PII filtering with user controls (opt-out, temporary chats, editable memory) to reduce personal data in training; engineers should avoid sending sensitive information and use available controls and privacy-portal workflows when needed.