The Guardrail Model: Three Layers, Different Boundaries
OpenAI's Agents SDK provides three types of guardrails, each with specific coverage boundaries that aren't immediately obvious from the API surface:
Input guardrails validate user input before (or during) agent execution. They're agent-level configurations that only trigger if that agent is the first in a chain. The key decision here is execution mode: parallel (default) or blocking.
Output guardrails validate the final agent output. They only run if the agent produces the final response in a chain — intermediate outputs from delegated agents skip output guardrails entirely.
Tool guardrails wrap individual function calls, running validation before and/or after execution. These are the most granular but also have the narrowest coverage.
Understanding these boundaries is essential because gaps between them are where security issues emerge in production.
The Parallel vs Blocking Problem
By default, input guardrails run in parallel with agent execution (run_in_parallel=True). This minimizes latency — both the guardrail and the agent start simultaneously. But this creates a race condition that has confused many developers.
If the guardrail triggers after the agent has already started, the agent may have consumed tokens and even initiated tool calls before the InputGuardrailTripwireTriggered exception halts execution. This isn't a bug — it's documented behaviour — but it's counterintuitive if you expect guardrails to act as a hard gate.
GitHub issue #889 describes exactly this: FileSearchTool running despite a triggered input guardrail. Issue #991 generalizes it: tool execution can continue after the exception is raised because the guardrail and agent are racing.
The solution is the blocking execution mode (run_in_parallel=False), which ensures the guardrail completes before the agent starts. This prevents any token consumption or tool execution if the guardrail triggers. The tradeoff is latency — you wait for the guardrail before any agent work begins.
When to use each mode:
- Parallel (default): Latency-sensitive applications where occasional guardrail race conditions are acceptable
- Blocking: Security-critical inputs, cost-sensitive workloads, or any scenario where tool execution must not occur on blocked inputs
What Tool Guardrails Actually Cover
Tool guardrails are configured on individual tools using the @function_tool decorator. They run on every invocation of that tool, making them useful for enforcing invariants at the call site.
But their coverage has significant limitations:
Covered: Custom function tools created with @function_tool
Not covered:
- Hosted tools (WebSearchTool, FileSearchTool, HostedMCPTool, CodeInterpreterTool, ImageGenerationTool)
- Built-in execution tools (ComputerTool, ShellTool, ApplyPatchTool, LocalShellTool)
- Handoffs (which run through a separate pipeline)
- Tools exposed via
Agent.as_tool()(doesn't expose guardrail options)
This means if your agent uses FileSearchTool to retrieve documents, those calls bypass tool guardrails entirely. If you rely on handoffs to delegate work, those handoff calls aren't guardrailed. If you expose an agent as a tool to another agent, that interface has no guardrail integration.
These aren't edge cases — they're common patterns in multi-agent systems. And they're exactly where malicious or accidental harmful behaviour can slip through.
The Coverage Gap Map
Here's what the guardrail system handles and where it leaves gaps:
| Scenario | Input Guardrail | Output Guardrail | Tool Guardrail |
|---|---|---|---|
| Direct user input to first agent | ✅ (if blocking) | N/A | N/A |
| User input after handoff | ❌ | N/A | N/A |
| Custom function tool call | N/A | N/A | ✅ |
| Hosted tool call (FileSearch, etc.) | N/A | N/A | ❌ |
| Handoff to another agent | N/A | N/A | ❌ |
| Final agent output | N/A | ✅ | N/A |
| Intermediate agent output | N/A | ❌ | N/A |
When You Still Need Tool Review
Guardrails operate at the orchestration layer — they validate inputs, outputs, and function call boundaries. They do not inspect the internal implementation of tools, the MCP servers they connect to, or the tool descriptions that guide agent behaviour.
This is where SkillShield fits. We scan:
- Tool descriptions for injected instructions and malicious guidance
- MCP server manifests for anomalous permission requests
- Skill code for credential theft patterns, data exfiltration, and runtime risks
Guardrails and SkillShield operate at different layers. Guardrails validate the flow of execution. SkillShield validates what gets executed. Production agent security needs both.