Mitigating Prompt Injection via Feature Restriction
OpenAI’s new 'Lockdown Mode' is a targeted security feature designed for users and organizations handling sensitive data. The primary objective is to minimize the attack surface for prompt injection—a vulnerability where malicious instructions embedded in external content (such as webpages or files) manipulate the model's behavior or attempt to exfiltrate data.
When enabled, Lockdown Mode imposes strict limitations on ChatGPT’s capabilities to prevent the model from interacting with potentially compromised external sources. Specifically, the mode disables:
- Live web browsing: Access is restricted to cached content only.
- Image retrieval: The model cannot fetch images from the web.
- Deep research: Advanced research capabilities are deactivated.
- Agent mode: Autonomous agentic behaviors are disabled.
A Layered Security Approach
OpenAI explicitly notes that Lockdown Mode is not a silver bullet. Even with these restrictions, the model remains potentially vulnerable to prompt injections originating from cached web content or user-uploaded files, which could still influence response accuracy or behavior. Consequently, the feature is positioned as a specialized tool for high-security environments rather than a general-purpose setting for all users.
This rollout is currently targeting self-serve ChatGPT Business accounts and eligible personal accounts, reflecting a strategy to provide enterprise-grade data protection controls to users who prioritize security over the full breadth of AI-driven automation features.