GPT-5 Faces Vulnerability: Router Can Redirect Queries to Older, Less Secure Models

A newly discovered vulnerability in GPT-5 could redirect user queries from the advanced GPT-5 Pro model to older, less secure versions, potentially exposing the system to jailbreaks, hallucinations, and unsafe outputs.

Researchers at Adversa AI identified the flaw, which stems from the model’s internal routing mechanism. When a user submits a prompt to GPT-5, an internal router determines which underlying model should handle the request. While users may expect GPT-5 Pro to process their query, the router can redirect it to GPT 3.5, GPT-4o, GPT-5-mini, or GPT-5-nano, depending on the complexity and internal optimization rules.

This automatic routing is likely designed to balance efficiency and cost, as running GPT-5 Pro for every query is expensive. Adversa AI estimates that this approach could save OpenAI nearly $1.86 billion annually. However, the process lacks transparency, and researchers discovered that it can be manipulated using specific trigger phrases embedded in the prompt.

The vulnerability, named PROMISQROUTE—short for Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion—allows attackers to manipulate the router’s decision-making, effectively choosing which model will respond.

While routing between models is not unique to OpenAI, most AI providers allow users to select the model manually. In GPT-5, the process is automated, which can introduce risks if an older, less robust model handles sensitive prompts.

During testing, Adversa AI found that older jailbreaks, previously ineffective against GPT-5, could succeed if the prompt was rerouted to a less secure model. This could lead to hallucinations or unsafe outputs, as older models may have weaker alignment and reasoning safeguards.

The most critical risk occurs when a malicious actor intentionally exploits the routing mechanism. By including a crafted instruction, an attacker could redirect a prompt meant for GPT-5 Pro to a vulnerable older model, bypassing the advanced safeguards of the latest version. Essentially, GPT-5’s security becomes only as strong as its weakest predecessor.

Addressing this vulnerability could involve disabling automatic routing to older models, but doing so would slow down responses and increase operational costs. Adversa AI recommends implementing stricter guardrails on the routing system and ensuring that all models in the architecture meet robust safety and alignment standards.

In summary, while GPT-5 Pro is powerful, its router-based architecture introduces potential security gaps that need urgent attention, especially as AI models become increasingly integrated into critical applications.

Source: https://www.securityweek.com/gpt-5-has-a-vulnerability-it-may-not-be-gpt-5-answering-your-call

Sign up for Newsletter

Cybersecurity

Observability

Software

IT Staffing

AWS

Azure

Google

Nebula

Datadog