New: Real-time hallucination alerts are live. Learn more →

LLM Metrix logoLLM Metrix
Back to Glossary
Definition

Prompt Injection

A security issue where malicious instructions hidden in content (or user input) manipulate an AI system into ignoring its original instructions. Relevant to brand safety because untrusted web content an AI retrieves could attempt to alter how a brand is represented.

Prompt injection is a security issue where malicious instructions hidden in content or user input manipulate an AI system into ignoring its original instructions and behaving in unintended ways. Because many AI systems read untrusted web content, that content can attempt to smuggle in instructions.

Why it matters for brand safety

If an AI engine retrieves a page that contains hidden adversarial instructions, those instructions could attempt to alter how the engine summarizes or represents information — including potentially how a brand is described. Understanding prompt injection is part of thinking about AI brand safety in a retrieval-driven world.

What it means in practice

  • Untrusted content is a risk surface. Anything an AI retrieves could carry injected instructions.
  • Defense is mostly the platform’s job, but brands should monitor how AI engines represent them and flag anomalies.
  • It’s distinct from hallucination — injection is adversarial manipulation, not the model simply being wrong.

Compare with hallucination, and see system prompt and prompt for the mechanics involved.

Ready to improve your AI visibility?

Put your knowledge into practice with step-by-step tutorials.