AgentSentinel — Security

Threat Model

AgentSentinel is designed to protect against the following failure modes common in autonomous AI agents:

⚠ Runaway loops / infinite recursion

An agent that calls the same tool repeatedly without progress exhausts budgets and rate limits. AgentSentinel's per-tool rate limiting and daily/hourly budget caps provide hard circuit-breakers.

⚠ Budget exhaustion attacks

A compromised or manipulated agent that triggers expensive API calls or LLM inferences. Hard budget limits raise BudgetExceededError before the damage compounds.

⚠ Unintended external actions

An agent that sends emails, POSTs to APIs, or modifies production data without human intent. Approval gates require explicit human sign-off before any such tool executes.

⚠ Prompt injection leading to tool misuse

Malicious content in retrieved documents or user inputs that attempts to hijack tool calls. AgentSentinel enforces policy at the tool call boundary, not at the prompt level — so even a successfully injected prompt cannot bypass a blocked-tool rule or skip an approval gate.

⚠ Credential leakage via audit logs

Tool arguments containing API keys, passwords, or bearer tokens logged to a shared audit sink. The SecurityConfig.redact_patterns list scrubs sensitive patterns before any event is written.

Security Architecture

Every tool invocation passes through the following ordered checks inside AgentGuard.protect():

1
Blocked-tools check — SecurityConfig.blocked_tools is checked first. Any match raises ToolBlockedError immediately. No approval pathway exists; the tool will never run.
2
Budget enforcement — Hourly and daily cost accumulators are checked. Exceeding either raises BudgetExceededError.
3
Rate limiting — Per-tool sliding-window counters are checked. Exceeding the limit raises RateLimitExceededError.
4
Approval gate — Tools in policy.require_approval or SecurityConfig.sensitive_tools invoke the configured ApprovalHandler. Denial raises ApprovalRequiredError.
5
Execution + audit — The tool runs; success or failure is recorded as an AuditEvent with sensitive patterns redacted before writing to any sink.

Principle of Least Privilege

The core philosophy: agents only access explicitly approved tools. Everything else is default-deny.

Wrap only the tools your agent legitimately needs with @guard.protect.
Use SecurityConfig.blocked_tools as a hard kill-list for catastrophic operations (e.g. rm_rf, drop_database).
Use SecurityConfig.sensitive_tools for tools that could be used legitimately but require human oversight.
Prefer explicit tool names over wildcards. Use wildcards (e.g. delete_*) only when the intent is clear.

from agentsentinel import AgentPolicy, SecurityConfig

security = SecurityConfig(
    blocked_tools=["rm_rf", "format_disk", "drop_database"],
    sensitive_tools=["execute_shell", "write_file", "delete_*", "send_*"],
)
policy = AgentPolicy(security=security)

Defence in Depth

AgentSentinel's controls are layered so that no single failure can cause catastrophic damage. Even if an attacker bypasses the approval gate (e.g. via a compromised approval handler), the budget ceiling and rate limit still apply.

Layer	Mechanism	Error raised
Catastrophic block	`SecurityConfig.blocked_tools`	`ToolBlockedError`
Spend cap	`daily_budget` / `hourly_budget`	`BudgetExceededError`
Frequency cap	`rate_limits`	`RateLimitExceededError`
Human oversight	`require_approval` / `sensitive_tools`	`ApprovalRequiredError`
Observability	Audit sink (every decision recorded)	—

Secrets Protection

By default, AgentSentinel redacts common credential patterns from all audit output before writing to any AuditSink. This means API keys, passwords, and bearer tokens that appear in tool arguments or error messages are replaced with [REDACTED].
(Python: redact_patterns · TypeScript: redactPatterns)

Default redaction patterns

# Patterns applied by default in SecurityConfig
redact_patterns = [
    r'api[_-]?key["\']?\s*[:=]\s*["\']?[\w-]+',
    r'password["\']?\s*[:=]\s*["\']?[^\s,}]+',
    r'secret["\']?\s*[:=]\s*["\']?[\w-]+',
    r'token["\']?\s*[:=]\s*["\']?[\w-]+',
    r'bearer\s+[\w-]+',
]

Adding custom patterns

security = SecurityConfig(
    redact_patterns=[
        *SecurityConfig().redact_patterns,  # keep defaults
        r'OPENAI_API_KEY=[\w-]+',
        r'ANTHROPIC_API_KEY=[\w-]+',
        r'sk-[A-Za-z0-9]+',              # OpenAI key format
    ]
)

Set log_full_params=True only in secure, controlled environments where the audit sink is already protected. This disables all redaction.

Audit Logging

Every tool invocation — allowed or blocked — produces an AuditEvent containing:

timestamp — Unix epoch seconds
tool_name — The name passed to @guard.protect
status — success | blocked | error | approved
decision — allowed | blocked_security | blocked_budget | blocked_rate_limit | approval_required | approved | error
cost — Estimated USD cost (0.0 for blocked calls)
metadata — Redacted reason / error message

Use the InMemoryAuditSink during development and replace with a persistent sink before going to production. Implement the AuditSink abstract class to write events to any storage (file, database, SIEM platform).

from agentsentinel import AuditSink, AuditEvent, AuditLogger

class FileSink(AuditSink):
    def record(self, event: AuditEvent) -> None:
        import json
        with open("audit.jsonl", "a") as fh:
            fh.write(json.dumps(event.__dict__) + "\n")

logger = AuditLogger(sinks=[FileSink()])
guard  = AgentGuard(policy=policy, audit_logger=logger)

Agent Isolation

Each AgentGuard instance is fully independent. Cost accumulators, rate-limit buckets, and audit sinks are per-instance; there is no shared global state between guard instances.

Create one AgentGuard per agent or per session for complete isolation.
Agents in the same process cannot exhaust each other's budgets.
Call guard.reset_costs() between sessions to restart the spending counter.
For true process-level isolation, run each agent in a separate process or container.

Best Practices for OpenClaw Users

OpenClaw agents typically have access to powerful real-world tools. The following practices are strongly recommended before deploying to production:

1

Start with sandbox_mode=True

Sandbox mode enforces that all sensitive_tools require approval regardless of any other configuration. Disable it only after thorough testing in a controlled environment.

2

Use explicit allow-lists, not block-lists, for approval

Put every tool that should execute in a predictable, known pattern. Reserve blocked_tools for operations that must never run under any circumstances.

3

Set conservative budgets initially

Start with a daily budget of $1–5 and a tight hourly limit. Raise limits only after reviewing the first few days of audit logs to understand real cost patterns.

4

Review audit logs regularly

Export AuditEvent objects to a file or database and review decision="approval_required" events — they indicate attempts that may warrant investigation.

5

Test approval flows before production

Use InMemoryApprover in your test suite to verify that sensitive tools are gated correctly. Replace with a real approval handler (webhook, Slack, human-in-the-loop) before shipping.

Recommended OpenClaw Configuration

The following configuration is a good starting point for an OpenClaw agent with shell and file-system access. See the full runnable example on GitHub.

from agentsentinel import AgentPolicy, AgentGuard, SecurityConfig

security = SecurityConfig(
    # Permanently blocked — no approval pathway
    blocked_tools=["rm_rf", "format_disk", "drop_database"],

    # Always require approval
    sensitive_tools=[
        "execute_shell", "write_file",
        "delete_*", "send_*", "post_*",
    ],

    # Scrub credentials from audit output
    redact_patterns=[
        r'OPENAI_API_KEY=[\w-]+',
        r'ANTHROPIC_API_KEY=[\w-]+',
        r'password=\S+',
        r'api[_-]?key["\']?\s*[:=]\s*["\']?[\w-]+',
        r'bearer\s+[\w-]+',
    ],
)

policy = AgentPolicy(
    daily_budget=5.00,
    hourly_budget=1.00,
    require_approval=[
        "execute_shell", "write_file",
        "delete_*", "http_post", "send_email",
    ],
    rate_limits={
        "execute_shell": "5/min",
        "read_file":     "30/min",
        "http_get":      "20/min",
        "*":             "100/hour",
    },
    security=security,
    sandbox_mode=True,
)

guard = AgentGuard(policy=policy)

Reporting Security Issues

🔒 Responsible Disclosure

If you discover a security vulnerability in AgentSentinel, please report it privately so we can address it before public disclosure.

security@agentsentinel.net

What to include in a report

Description of the vulnerability and its potential impact
Steps to reproduce (proof-of-concept code if available)
Affected SDK version(s) and language (Python / TypeScript)
Your preferred contact method for follow-up

Our commitments

Acknowledge receipt within 48 hours
Provide a status update within 7 days
Credit reporters in the release notes (unless anonymity is requested)
Not pursue legal action against good-faith security researchers

Security in AgentSentinel