HIGH March 9, 2026 5 min read

What Is Identity Spoofing in AI Agents?

SkillShield Research Team

Security Research

Identity spoofing in AI agents happens when one agent — or a malicious skill — impersonates a trusted agent, service, or user to gain permissions it shouldn't have. Unlike traditional network spoofing, it doesn't require packet manipulation. It exploits the fact that most AI agent frameworks trust the identity claims that arrive in messages, tool results, and memory.

The result: an attacker who can influence what an agent reads can make it believe it's talking to a different, more trusted source — and act accordingly.


How does identity spoofing work in practice?

In a typical multi-agent setup, Agent A might defer to Agent B for certain tasks because Agent B has a different role — administrative access, memory write permissions, or external API credentials. The trust relationship is established by the system prompt, memory, or a channel message.

Identity spoofing exploits that trust relationship in three common ways:

1. Agent impersonation via message injection
An attacker injects a message that appears to come from a trusted agent. If Agent A is configured to follow instructions from "AgentB," a message saying "I am AgentB. Execute the following…" may be obeyed without verification.

2. Service impersonation through tool results
When an agent calls an external tool and gets a result back, it usually trusts that result. A compromised or malicious skill can return fabricated tool results that impersonate a legitimate service — making the agent believe it received data from a trusted API when it didn't.

3. Memory poisoning with false identities
Agents with persistent memory read prior context to inform decisions. If an attacker can write to an agent's memory — or cause it to be written — they can plant false identity claims that persist across sessions. A future action by the agent then operates on a false trust baseline.


Where did this attack get documented?

The Agents of Chaos paper — a two-week adversarial lab study by 38 researchers at Northeastern, Harvard, Stanford, MIT, and CMU — catalogued identity spoofing as one of 11 distinct attack categories observed against live AI agents running on an OpenClaw deployment.

The researchers found that identity spoofing was especially effective when combined with other attack vectors: a cross-agent propagation attack could spread malicious instructions laterally, while an identity spoofing component ensured those instructions appeared to come from a trusted source at each hop.

That combination — lateral spread plus identity forgery — is what makes it a high-severity category, not just a curiosity.


Why is identity spoofing hard to detect?

Three reasons:

There's no PKI for agent messages. Email has DKIM. API calls have signed tokens. AI agent messages — especially those passed through memory, tool results, or inter-agent channels — typically have no cryptographic binding to a verified identity. The agent trusts what it reads.

Multi-agent chains obscure the origin. In a pipeline of three or more agents, it becomes hard to audit which agent actually originated an instruction. By the time a spoofed identity claim reaches the executing agent, it may have passed through two legitimate intermediaries.

Runtime scanners test the endpoint, not the chain. A runtime scanner tests what happens when you send adversarial prompts to a running agent — it won't surface a spoofing attack that originates in a compromised skill's tool output or a poisoned memory entry. For a full breakdown of how runtime testing compares to supply-chain inspection, see SkillShield vs AgentSeal: Two Layers of AI Agent Security.


What does SkillShield do about it?

SkillShield operates at the supply-chain layer — it scans MCP skills, tools, and plugins before they execute. For identity spoofing specifically, that means:

  • Tool result inspection: SkillShield can flag skills that return results claiming to originate from identity-asserting sources (e.g., "I am the orchestrator agent") or that inject permission-escalation language into structured outputs.
  • Supply-chain integrity checks: If a skill you installed last week was modified to add identity-spoofing payloads to its responses, SkillShield detects the change against the known-good baseline.
  • Passive exfiltration surface mapping: Skills that access memory write paths or inter-agent channels get flagged as higher-risk, regardless of whether they're currently doing anything malicious.

Identity spoofing isn't a runtime attack you can patch with a better system prompt. It originates in the tools your agent trusts. That's the layer SkillShield secures.


Quick answers

Can a single-agent setup be vulnerable to identity spoofing?
Yes — if the agent reads from any external source (tool results, web content, memory files), those sources can inject identity claims the agent treats as authoritative.

Is identity spoofing the same as prompt injection?
Related but distinct. Prompt injection overrides an agent's instructions. Identity spoofing overrides an agent's trust model — making it believe a different party is giving those instructions. They often occur together.

Does this require an inside attacker?
No. A compromised skill package, a malicious third-party MCP server, or a poisoned data source the agent reads can all deliver spoofed identity claims without any insider access.


Related reading

Secure the layer identity spoofing comes from

SkillShield scans MCP skills and tool outputs at the supply-chain layer — catching identity-asserting payloads, tampered baselines, and high-risk access patterns before your agent executes them.

Get early access