Prompt Injection
Prompt injection is an attack against AI systems where malicious instructions are embedded in input data to manipulate a model’s behaviour, bypass safety controls, or cause it to take unintended actions.
Last updated: March 3, 2026
Prompt injection is an attack technique that targets large language models (LLMs) and AI-powered systems. An attacker crafts malicious text—embedded in user input, a document, a web page, or any data the model processes—to override or subvert the model’s original instructions.
There are two main variants. Direct prompt injection occurs when the attacker controls the user-facing input directly. Indirect prompt injection (also called stored prompt injection) places the malicious payload in external content the model retrieves—such as a webpage, file, or tool output—so it executes without the user’s knowledge.
As AI agents gain the ability to take real-world actions—browsing the web, executing code, or calling APIs—prompt injection becomes a critical attack surface. Mitigations include input sanitisation, privilege separation, sandboxing agent capabilities, and human-in-the-loop approval for sensitive actions.