Security in AgentSentinel
AgentSentinel is built for agents with real tool access — shell commands, file writes, API calls. This page describes the threat model, defence layers, and recommended configuration for production deployments, especially for OpenClaw users.
Threat Model
AgentSentinel is designed to protect against the following failure modes common in autonomous AI agents:
⚠ Runaway loops / infinite recursion
An agent that calls the same tool repeatedly without progress exhausts budgets and rate limits. AgentSentinel's per-tool rate limiting and daily/hourly budget caps provide hard circuit-breakers.
⚠ Budget exhaustion attacks
A compromised or manipulated agent that triggers expensive API calls or LLM inferences.
Hard budget limits raise BudgetExceededError before the damage compounds.
⚠ Unintended external actions
An agent that sends emails, POSTs to APIs, or modifies production data without human intent. Approval gates require explicit human sign-off before any such tool executes.
⚠ Prompt injection leading to tool misuse
Malicious content in retrieved documents or user inputs that attempts to hijack tool calls. AgentSentinel enforces policy at the tool call boundary, not at the prompt level — so even a successfully injected prompt cannot bypass a blocked-tool rule or skip an approval gate.
⚠ Credential leakage via audit logs
Tool arguments containing API keys, passwords, or bearer tokens logged to a shared audit sink.
The SecurityConfig.redact_patterns list
scrubs sensitive patterns before any event is written.
Security Architecture
Every tool invocation passes through the following ordered checks inside
AgentGuard.protect():
-
1
Blocked-tools check —
SecurityConfig.blocked_toolsis checked first. Any match raisesToolBlockedErrorimmediately. No approval pathway exists; the tool will never run. -
2
Budget enforcement — Hourly and daily cost accumulators are checked. Exceeding either raises
BudgetExceededError. -
3
Rate limiting — Per-tool sliding-window counters are checked. Exceeding the limit raises
RateLimitExceededError. -
4
Approval gate — Tools in
policy.require_approvalorSecurityConfig.sensitive_toolsinvoke the configuredApprovalHandler. Denial raisesApprovalRequiredError. -
5
Execution + audit — The tool runs; success or failure is recorded as an
AuditEventwith sensitive patterns redacted before writing to any sink.
Principle of Least Privilege
The core philosophy: agents only access explicitly approved tools. Everything else is default-deny.
- Wrap only the tools your agent legitimately needs with
@guard.protect. - Use
SecurityConfig.blocked_toolsas a hard kill-list for catastrophic operations (e.g.rm_rf,drop_database). - Use
SecurityConfig.sensitive_toolsfor tools that could be used legitimately but require human oversight. - Prefer explicit tool names over wildcards. Use wildcards (e.g.
delete_*) only when the intent is clear.
Defence in Depth
AgentSentinel's controls are layered so that no single failure can cause catastrophic damage. Even if an attacker bypasses the approval gate (e.g. via a compromised approval handler), the budget ceiling and rate limit still apply.
| Layer | Mechanism | Error raised |
|---|---|---|
| Catastrophic block | SecurityConfig.blocked_tools |
ToolBlockedError |
| Spend cap | daily_budget / hourly_budget |
BudgetExceededError |
| Frequency cap | rate_limits |
RateLimitExceededError |
| Human oversight | require_approval / sensitive_tools |
ApprovalRequiredError |
| Observability | Audit sink (every decision recorded) | — |
Secrets Protection
By default, AgentSentinel redacts common credential patterns from all audit output before
writing to any AuditSink.
This means API keys, passwords, and bearer tokens that appear in tool arguments or error messages
are replaced with [REDACTED].
(Python: redact_patterns · TypeScript: redactPatterns)
Default redaction patterns
Adding custom patterns
Set log_full_params=True only in secure,
controlled environments where the audit sink is already protected. This disables all redaction.
Audit Logging
Every tool invocation — allowed or blocked — produces an
AuditEvent containing:
- timestamp — Unix epoch seconds
- tool_name — The name passed to
@guard.protect - status —
success|blocked|error|approved - decision —
allowed|blocked_security|blocked_budget|blocked_rate_limit|approval_required|approved|error - cost — Estimated USD cost (0.0 for blocked calls)
- metadata — Redacted reason / error message
Use the InMemoryAuditSink during development
and replace with a persistent sink before going to production.
Implement the AuditSink abstract class
to write events to any storage (file, database, SIEM platform).
Agent Isolation
Each AgentGuard instance is fully independent.
Cost accumulators, rate-limit buckets, and audit sinks are per-instance;
there is no shared global state between guard instances.
- Create one
AgentGuardper agent or per session for complete isolation. - Agents in the same process cannot exhaust each other's budgets.
- Call
guard.reset_costs()between sessions to restart the spending counter. - For true process-level isolation, run each agent in a separate process or container.
Best Practices for OpenClaw Users
OpenClaw agents typically have access to powerful real-world tools. The following practices are strongly recommended before deploying to production:
sandbox_mode=True
Sandbox mode enforces that all sensitive_tools require approval
regardless of any other configuration. Disable it only after thorough testing in a controlled environment.
Put every tool that should execute in a predictable, known pattern.
Reserve blocked_tools for operations that must never run
under any circumstances.
Start with a daily budget of $1–5 and a tight hourly limit. Raise limits only after reviewing the first few days of audit logs to understand real cost patterns.
Export AuditEvent objects to a file or database and review
decision="approval_required" events —
they indicate attempts that may warrant investigation.
Use InMemoryApprover in your test suite to verify that
sensitive tools are gated correctly. Replace with a real approval handler (webhook, Slack, human-in-the-loop)
before shipping.
Recommended OpenClaw Configuration
The following configuration is a good starting point for an OpenClaw agent with shell and file-system access. See the full runnable example on GitHub.
Reporting Security Issues
🔒 Responsible Disclosure
If you discover a security vulnerability in AgentSentinel, please report it privately so we can address it before public disclosure.
security@agentsentinel.devWhat to include in a report
- Description of the vulnerability and its potential impact
- Steps to reproduce (proof-of-concept code if available)
- Affected SDK version(s) and language (Python / TypeScript)
- Your preferred contact method for follow-up
Our commitments
- Acknowledge receipt within 48 hours
- Provide a status update within 7 days
- Credit reporters in the release notes (unless anonymity is requested)
- Not pursue legal action against good-faith security researchers