ANALYSIS March 9, 2026 5 min read

Two Threat Models for Local AI Agents: Filesystem Sandboxing vs Skill Vetting

SkillShield Research Team

Security Research

The "Agent Safehouse" project hit the front page of Hacker News recently (717+ points), and for good reason. It solves a real problem: AI agents running with unrestricted access to your local filesystem can read your SSH keys, AWS credentials, browser cookies, and sensitive documents — all by default.

But filesystem sandboxing is only half the security story. There's a second threat model that gets less attention but is equally critical: what happens when the agent runs code you didn't write?

Understanding both threat models — and how they differ — is the foundation of any serious AI agent security posture. For a comprehensive view of all the ways agents can be attacked, see The 11 AI Agent Attack Types Every Developer Should Know.


Threat Model 1: Runtime Filesystem Access

The risk

Your AI agent can read any file the user account can access. SSH keys in ~/.ssh, AWS credentials in ~/.aws/credentials, API keys in environment variables, browser session cookies, private documents.

The solution: filesystem sandboxing

  • Read-only access to specific directories
  • Network egress filtering
  • Credential isolation
  • Process monitoring

What it catches: Accidental data exposure, compromised dependencies exfiltrating files, prompt injection attacks that try to read sensitive data.


Threat Model 2: Supply Chain & Skill Vetting

The risk

AI agents increasingly run third-party skills — small code packages that extend capabilities. These skills execute shell commands, access environment variables, make network requests, and often come from unverified community sources. The Agents of Chaos paper documented exactly how compromised skills propagate through live agent environments.

The problem sandboxing misses

A skill can be completely "safe" from a filesystem perspective but still:

  • Exfiltrate data to external domains via network calls
  • Execute hidden malicious commands
  • Contain prompt injection payloads
  • Include obfuscated Base64 or hidden Unicode injections
  • Pull in compromised npm or Python dependencies

The solution: SkillShield static analysis + LLM classification

  • Scans skill files before execution
  • Detects data exfiltration patterns
  • Identifies privilege escalation attempts
  • Finds obfuscated payloads and hidden injections
  • Scores risk so you can make informed decisions

Why you need both

Layer Catches Misses
Filesystem Sandboxing Unauthorized file reads, credential access via filesystem Malicious network calls, data exfiltration via APIs, compromised skill logic
Skill Vetting Malicious code patterns, supply chain attacks, obfuscated payloads Legitimate code that accidentally accesses sensitive files

Filesystem sandboxing constrains what the agent can access.
Skill vetting constrains what the agent will run.
They're complementary, not competitive.


Why SkillShield focuses on the second threat model

Static analysis catches what runtime can't. A skill that passes all sandbox permissions can still be malicious. Static analysis reads the code before it runs.

Classification scales. Manual review doesn't work at scale. LLM-based classification with human oversight can scan thousands of skills. SkillShield has scanned 33,746 skills across 6 marketplaces — see how that compares to other scanners in 2026.

The supply chain is the attack surface. Recent incidents (ClawHavoc, CVE-2025-6514) show that compromised skills are the vector, not filesystem breaches. Of the 11 documented AI agent attack types, six are addressable at the pre-install layer.


Real-world example

Cisco recently found an OpenClaw skill silently exfiltrating data. Filesystem sandboxing wouldn't have caught it — the skill had legitimate file access permissions. Static analysis did catch it: the skill contained a hidden network call to an external domain with no legitimate purpose.

That's the gap SkillShield fills.


Integration: SkillShield + sandboxing

The most secure setups use both:

  1. SkillShield vets skills before they're installed or executed — catching supply chain attacks, obfuscated payloads, and malicious tool definitions.
  2. Agent Safehouse (or similar) constrains runtime filesystem access — limiting blast radius if a skill does something unexpected.
  3. Network egress filtering blocks unauthorized outbound connections from both layers.

This defense-in-depth approach covers both threat models: supply chain attacks via malicious skills and accidental data exposure via unrestricted filesystem access.


Get started

# Scan your skills before installing
npx skillshield scan https://clawhub.com/skills/example

# Or scan a local skills directory
npx skillshield scan ./skills/

Layer 1 starts here.

SkillShield covers the supply chain threat model — scanning 33,746 skills with static analysis and LLM classification before your agent runs anything.

Get early access