Cybersecurity researchers are warning about a new and dangerously effective technique to bypass the safety mechanisms of large language models (LLMs) such as those developed by OpenAI and Google. The method, dubbed “Echo Chamber”, enables attackers to manipulate these models into producing harmful or policy-violating responses — even when robust safeguards are in place.
According to a report by NeuralTrust shared with The Hacker News, Echo Chamber differs from traditional jailbreaks that rely on obfuscated text or clever wordplay. Instead, it leverages a subtle mix of semantic manipulation, indirect references, and multi-step reasoning to gradually erode a model’s resistance and guide it toward producing inappropriate content.
“Echo Chamber subtly alters the model’s internal reasoning over multiple turns, causing it to violate its own safety policies without ever realizing it,” explains Ahmad Alobaid, a NeuralTrust researcher.
Why This Jailbreak is Different
While LLM developers have implemented layers of defense against prompt injections and adversarial attacks, the Echo Chamber technique exposes a lingering vulnerability: models can still be exploited through context poisoning and conversation steering — often without any advanced technical skills.
Unlike methods like “Crescendo,” where the attacker controls the dialogue from the beginning, Echo Chamber manipulates the model using only its own generated responses. The attacker introduces ambiguous or innocuous prompts early in the conversation. As the model continues responding, those responses are used to subtly redirect it toward violating content guidelines.
“It becomes a feedback loop,” says NeuralTrust. “Early planted inputs influence the model’s next outputs, which are then used to push it further — all while the original attack goal remains hidden.”

Multi-Shot and Multi-Turn Exploits
LLMs with large context windows are particularly vulnerable. By flooding the model with prior examples of harmful behavior (called many-shot jailbreaks) or gradually escalating prompts (multi-turn jailbreaking), attackers can coax the AI into continuing a pattern — often leading to the generation of toxic or illegal content.
In controlled lab tests, the Echo Chamber technique was able to bypass safety measures with an alarming success rate:
- 90%+ on topics involving hate speech, violence, sexism, and explicit content
- Nearly 80% for misinformation and self-harm
These results show a clear blind spot in current LLM alignment strategies, especially as models become more capable of sustained, contextual reasoning.

Real-World Implications: “Living off AI”
The danger doesn’t stop at conversational attacks. In a related finding, Cato Networks demonstrated a proof-of-concept attack targeting Atlassian’s Model Context Protocol (MCP). The attacker submitted a malicious support ticket that, when processed by a human using Jira tools, caused a prompt injection — turning the human into an accidental attack vector.
“The attacker never touched the MCP server,” Cato researchers said. “They used the support engineer as a proxy, exploiting the AI from the outside.”
This method — dubbed “Living off AI” — shows how models connected to external workflows (like helpdesk systems or CRMs) can be silently hijacked if they’re not properly sandboxed or isolated.
Why It Matters for Security Teams
As LLMs continue to power more tools across enterprises, from chatbots to support systems and automation flows, the risk of indirect AI exploitation is rising. Echo Chamber and related techniques reveal how attackers can abuse AI systems without triggering traditional alerts — making them harder to detect and contain.
Source: https://thehackernews.com/2025/06/echo-chamber-jailbreak-tricks-llms-like.html