Two Threat Models for Local AI Agents: Filesystem Sandboxing vs Skill Vetting

The "Agent Safehouse" project hit the front page of Hacker News recently (717+ points), and for good reason. It solves a real problem: AI agents running with unrestricted access to your local filesystem can read your SSH keys, AWS credentials, browser cookies, and sensitive documents — all by default.

But filesystem sandboxing is only half the security story. There's a second threat model that gets less attention but is equally critical: what happens when the agent runs code you didn't write?

Understanding both threat models — and how they differ — is the foundation of any serious AI agent security posture. For a comprehensive view of all the ways agents can be attacked, see The 11 AI Agent Attack Types Every Developer Should Know.

Threat Model 1: Runtime Filesystem Access

The risk

Your AI agent can read any file the user account can access. SSH keys in ~/.ssh, AWS credentials in ~/.aws/credentials, API keys in environment variables, browser session cookies, private documents.

The solution: filesystem sandboxing

Read-only access to specific directories
Network egress filtering
Credential isolation
Process monitoring

What it catches: Accidental data exposure, compromised dependencies exfiltrating files, prompt injection attacks that try to read sensitive data.

Threat Model 2: Supply Chain & Skill Vetting

The risk

AI agents increasingly run third-party skills — small code packages that extend capabilities. These skills execute shell commands, access environment variables, make network requests, and often come from unverified community sources. The Agents of Chaos paper documented exactly how compromised skills propagate through live agent environments.

The problem sandboxing misses

A skill can be completely "safe" from a filesystem perspective but still:

Exfiltrate data to external domains via network calls
Execute hidden malicious commands
Contain prompt injection payloads
Include obfuscated Base64 or hidden Unicode injections
Pull in compromised npm or Python dependencies

The solution: SkillShield static analysis + LLM classification

Scans skill files before execution
Detects data exfiltration patterns
Identifies privilege escalation attempts
Finds obfuscated payloads and hidden injections
Scores risk so you can make informed decisions

Why you need both

Layer	Catches	Misses
Filesystem Sandboxing	Unauthorized file reads, credential access via filesystem	Malicious network calls, data exfiltration via APIs, compromised skill logic
Skill Vetting	Malicious code patterns, supply chain attacks, obfuscated payloads	Legitimate code that accidentally accesses sensitive files

Filesystem sandboxing constrains what the agent can access.
Skill vetting constrains what the agent will run.
They're complementary, not competitive.

Why SkillShield focuses on the second threat model

Static analysis catches what runtime can't. A skill that passes all sandbox permissions can still be malicious. Static analysis reads the code before it runs.

Classification scales. Manual review doesn't work at scale. LLM-based classification with human oversight can scan thousands of skills. SkillShield has scanned 33,746 skills across 6 marketplaces — see how that compares to other scanners in 2026.

The supply chain is the attack surface. Recent incidents (ClawHavoc, CVE-2025-6514) show that compromised skills are the vector, not filesystem breaches. Of the 11 documented AI agent attack types, six are addressable at the pre-install layer.

Real-world example

Cisco recently found an OpenClaw skill silently exfiltrating data. Filesystem sandboxing wouldn't have caught it — the skill had legitimate file access permissions. Static analysis did catch it: the skill contained a hidden network call to an external domain with no legitimate purpose.

That's the gap SkillShield fills.

Integration: SkillShield + sandboxing

The most secure setups use both:

SkillShield vets skills before they're installed or executed — catching supply chain attacks, obfuscated payloads, and malicious tool definitions.
Agent Safehouse (or similar) constrains runtime filesystem access — limiting blast radius if a skill does something unexpected.
Network egress filtering blocks unauthorized outbound connections from both layers.

This defense-in-depth approach covers both threat models: supply chain attacks via malicious skills and accidental data exposure via unrestricted filesystem access.

Get started

# Scan your skills before installing
npx skillshield scan https://clawhub.com/skills/example

# Or scan a local skills directory
npx skillshield scan ./skills/

Two Threat Models for Local AI Agents: Filesystem Sandboxing vs Skill Vetting

Threat Model 1: Runtime Filesystem Access

The risk

The solution: filesystem sandboxing

Threat Model 2: Supply Chain & Skill Vetting

The risk

The problem sandboxing misses

The solution: SkillShield static analysis + LLM classification

Why you need both

Why SkillShield focuses on the second threat model

Real-world example

Integration: SkillShield + sandboxing

Get started

Layer 1 starts here.

Share this article