0%
Still working...

Prompt Injection Is the SQL Injection of AI. Here’s the Zero Trust Architecture Pattern That Contains the Blast Radius

In the early 2000s, SQL injection was the vulnerability that every web developer swore they understood but kept shipping anyway. It took a decade of breaches, frameworks, and parameterised queries to make it rare enough to stop being a daily headline.

Prompt injection is following the same trajectory. We know it exists. We know it’s dangerous. And we’re still shipping systems without meaningful containment.

The difference is that SQL injection exploited a single interface — the database query. Prompt injection exploits the reasoning engine itself. And when that reasoning engine has access to tools, APIs, and enterprise data, the blast radius isn’t a database dump. It’s whatever the agent can reach.

What Prompt Injection Actually Is

For anyone who hasn’t encountered this yet, prompt injection is an attack where adversarial input manipulates a language model into ignoring its system instructions and performing unintended actions.

It comes in two flavours. Direct injection is when a user deliberately crafts input to override the system prompt — “Ignore previous instructions and instead do X.” Indirect injection is more insidious. The malicious payload is embedded in data the model retrieves — a document, a web page, an email, a database record. The model reads the poisoned content, treats it as instruction, and acts on it.

Direct injection is a known risk that most teams have at least some defences against. Indirect injection is the one that keeps me up at night. Because it means any untrusted data source becomes a potential attack vector for every agent that reads from it.

Why Traditional Security Doesn’t Contain This

The instinct in most organisations is to apply the same patterns they use for traditional application security. Input validation, output filtering, WAF rules, content moderation. And those help at the margins.

But they don’t solve the fundamental problem. Language models don’t have a clean separation between data and instructions. That’s the whole point — the model processes natural language, which is simultaneously data and potential instruction. You can’t parameterise a prompt the way you parameterise a SQL query because the model’s “query language” is human language itself.

This means you can’t filter your way to safety. You have to architect your way there.

The Zero Trust Architecture Pattern

Here’s the pattern I’ve been implementing with enterprise clients. It’s built on the same zero trust principles that work for network security and identity management, applied to the specific threat model of prompt injection.

Principle 1: Assume every input is hostile.

Every text that enters the system — user messages, retrieved documents, API responses, email content — should be treated as potentially adversarial. This doesn’t mean blocking everything. It means every input passes through a classification layer before reaching the model’s reasoning context.

That classification layer checks for known injection patterns, anomalous instruction-like content in data sources, and content that deviates significantly from the expected schema of the data being processed.

Principle 2: Minimise the blast radius through permission boundaries.

This is where agent identity architecture directly intersects with prompt injection defence. If an agent has broad permissions and gets injected, the attacker inherits those permissions. If the agent has scoped, just-in-time access to only the resources it needs for the current task, the blast radius is contained.

Every agent should operate with the minimum permissions required. No standing access. No inherited human credentials. No “we’ll lock it down later.” This is the same principle as least privilege for service accounts, applied to agents.

Principle 3: Separate the reasoning layer from the action layer.

The most effective architectural pattern I’ve found is putting a gateway between the model’s reasoning and its ability to take action. The model can propose actions. It cannot execute them directly.

A policy enforcement point sits between the model’s output and the execution environment. It validates that the proposed action is within the agent’s authorised scope, that it matches the expected patterns for the current task, and that it doesn’t violate any security policies. If the model has been injected and proposes an out-of-scope action — say, exfiltrating data to an external endpoint — the enforcement point blocks it, regardless of what the model thinks it should do.

This is architecturally identical to how a firewall works. The firewall doesn’t care why a packet is trying to reach an unauthorised destination. It just blocks it.

Principle 4: Monitor and detect, don’t just prevent.

Even with the best architectural controls, some injection attempts will get through. The question is whether you detect them before they cause damage.

The monitoring layer should track behavioural anomalies in agent actions — sudden changes in the types of resources accessed, unusual patterns in tool calls, requests that don’t match the expected workflow. When the model’s behaviour deviates from its baseline, that’s a signal worth investigating.

This is the same concept as UEBA (User and Entity Behaviour Analytics), extended to agent entities. If your agent normally queries three data sources and suddenly starts accessing twelve, something has changed. It might be a legitimate new requirement. It might be an injection.

Principle 5: Segment data exposure.

Not every piece of data in your organisation should be available to every agent. The retrieval layer — whether it’s RAG over a vector store, a database query, or an API call — should enforce data segmentation.

An agent handling customer service queries doesn’t need access to your financial planning documents. An agent summarising meeting notes doesn’t need access to your HR system. Segmenting what data each agent can see reduces the poisoning surface for indirect injection.

If an attacker plants a malicious instruction in a document, that instruction only affects agents that can read that document. Segmentation limits both the attack surface and the blast radius.

Putting It Together

The full pattern looks like this:

Input classification catches obvious and known injection attempts before they reach the model. Permission boundaries ensure that even a successfully injected agent can only access a narrow slice of your environment. The reasoning-action gateway prevents the model from directly executing harmful actions. Behavioural monitoring detects anomalies that slip through preventive controls. Data segmentation limits which data sources each agent can access, reducing the surface for indirect injection.

No single layer is sufficient. That’s the point. Zero trust works because it assumes every individual control will eventually fail, and designs the system so that a single failure doesn’t cascade into a breach.

The SQL Injection Parallel

SQL injection taught us that you can’t rely on input validation alone when the processing engine treats data and instructions as the same thing. The solution wasn’t better filtering — it was architectural: parameterised queries, prepared statements, ORMs that enforce the separation by design.

We haven’t found the “parameterised query” equivalent for prompt injection yet. Maybe we will. Research into structured prompting, constrained decoding, and model-level instruction hierarchy is promising.

But until that breakthrough arrives, the answer is architecture. Contain the blast radius. Minimise permissions. Separate reasoning from action. Monitor everything. Segment data.

It’s not elegant. It’s not a silver bullet. But it’s the pattern that actually works in production — and it’s the same zero trust thinking that eventually made SQL injection a solved problem for anyone who cared enough to implement the fix.

Leave A Comment

Recommended Posts