DMR News

Advancing Digital Conversations

OpenAI Warns Prompt Injection Attacks Will Persist As Atlas AI Browser Expands

ByJolyen

Dec 25, 2025

OpenAI Warns Prompt Injection Attacks Will Persist As Atlas AI Browser Expands

OpenAI says prompt injection attacks remain an unsolved and enduring security risk for AI agents operating on the open web, even as the company rolls out new defenses for its Atlas AI browser. The admission underscores ongoing concerns about how safely autonomous AI systems can browse, read, and act on untrusted content online.

Prompt Injection Seen As A Long-Term Security Problem

OpenAI said in a blog post published Monday that prompt injection attacks are unlikely to ever be fully eliminated. The company compared the threat to scams and social engineering on the web, arguing that complete prevention is unrealistic.

OpenAI acknowledged that Atlas’ “agent mode,” which allows the browser to take actions on a user’s behalf, increases the system’s overall security exposure. The company described prompt injection as a structural challenge that must be managed through ongoing mitigation rather than solved outright.

Early Research Exposed Atlas Vulnerabilities

OpenAI launched the ChatGPT Atlas browser in October. Shortly after release, security researchers demonstrated that indirect prompt injection could alter the browser’s behavior using hidden instructions embedded in content such as Google Docs.

On the same day, Brave published research arguing that indirect prompt injection is a systemic issue affecting AI-powered browsers broadly, including tools such as Perplexity’s Comet.

The risk has also been highlighted by government agencies. Earlier this month, the UK National Cyber Security Centre warned that prompt injection attacks against generative AI systems may never be fully mitigated and advised organizations to focus on reducing impact rather than assuming prevention is possible.

OpenAI’s Defensive Strategy

OpenAI said it is adopting a rapid-response security model focused on continuous testing and faster remediation cycles. The company said this approach is already helping it identify novel attack methods internally before they appear in real-world use.

This strategy aligns with approaches described by other AI developers, including Anthropic and Google, which have emphasized layered defenses and repeated stress testing for agentic systems. Google has said its recent work focuses on architectural and policy-level safeguards.

Automated Attacker Trained With Reinforcement Learning

Where OpenAI diverges is its use of what it calls an “LLM-based automated attacker.” The system is trained with reinforcement learning to behave like a hacker attempting to inject malicious instructions into AI agents.

OpenAI said the attacker operates in simulation, observing how the target AI reasons through an attack and what actions it would take. Based on that insight, the system can iteratively refine its tactics and test them repeatedly.

Because the attacker has visibility into internal reasoning processes that external researchers lack, OpenAI says it can surface weaknesses more quickly than real-world attackers.

The company said its automated attacker was able to induce agents to carry out complex harmful workflows spanning dozens or even hundreds of steps, and that some attack strategies discovered this way had not appeared in human red teaming exercises or external reports.

Demonstrated Attack And Detection

In one example shared by OpenAI, the automated attacker embedded malicious instructions in an email. When the AI agent later scanned the inbox, it followed those instructions and sent a resignation message instead of drafting an out-of-office reply.

OpenAI said that following recent security updates, Atlas’ agent mode successfully detected the prompt injection attempt and alerted the user.

The company declined to disclose whether the updates have reduced the real-world success rate of prompt injection attacks but said it has worked with external partners on Atlas security since before launch.

Risk Trade-Offs For Agentic Browsers

Rami McCarthy, principal security researcher at cybersecurity firm Wiz, said reinforcement learning can help systems adapt to attacker behavior but does not eliminate underlying risk.

He described AI risk as a function of autonomy multiplied by access. According to McCarthy, agentic browsers sit in a high-risk zone because they combine moderate autonomy with extensive access to sensitive data such as email and payment systems.

McCarthy said common safeguards reflect that trade-off, including limiting logged-in access and requiring user confirmation before actions like sending messages or making payments.

User-Level Mitigations

OpenAI echoed those recommendations, saying Atlas is designed to request confirmation before performing sensitive actions. The company also advised users to give agents narrowly scoped instructions instead of broad mandates.

Granting wide latitude, OpenAI said, increases the likelihood that hidden or malicious content can influence agent behavior, even when safeguards are in place.

Open Questions Around Value And Risk

While OpenAI described protecting Atlas users from prompt injection as a priority, McCarthy questioned whether the current benefits of agentic browsers justify their risk profile.

He said that for many everyday tasks, the combination of high access and unresolved security challenges means the trade-offs remain significant, even if that balance may change as defenses mature.


Featured image credits: Wikimedia Commons

For more stories like it, click the +Follow button at the top of this page to follow us.

Jolyen

As a news editor, I bring stories to life through clear, impactful, and authentic writing. I believe every brand has something worth sharing. My job is to make sure it’s heard. With an eye for detail and a heart for storytelling, I shape messages that truly connect.

Leave a Reply

Your email address will not be published. Required fields are marked *