OpenClaw Emergency Response Playbook: Incident Response Guide

Why You Need an Incident Response Plan

If your OpenClaw agent starts misbehaving, you might have minutes to stop damage. Having a playbook means you react, not panic.

Common incidents:

API bill spike to 10x normal
Agent making requests to suspicious domains
Repeated failed authentication attempts
Agent ignoring SOUL.md boundaries
Gateway not responding

Incident Response Playbook

Incident Type 1: Unexpected API Bill Spike

Symptoms:

Daily API spend went from $5 to $50+
Token usage drastically increased
You notice it in the dashboard or provider email

Immediate Actions (within 5 minutes):

STOP the gateway immediately: openclaw stop
Check both provider dashboards:
- Moonshot (Kimi): https://console.moonshot.cn/api-keys
- Anthropic: https://console.anthropic.com/account/usage
- OpenAI (if using): https://platform.openai.com/account/usage
Identify the spike: Was it in input tokens, output tokens, or API calls?
Check recent logs: openclaw logs --last 1h | grep -i error

Investigation (next 15-30 minutes):

Review session logs for loops:


# Look for repetitive patterns
openclaw logs --last 1h | grep -E "API call|error|retry"

# Check for specific patterns
- Same API call repeated 1000+ times
- Error → retry → error loop
- Exponential backoff not working

Check for injection or compromise:
- Were recent messages unusual?
- Are there API calls to unexpected endpoints?
- Did the agent ignore tool policy restrictions?
Determine root cause:
- Legitimate workflow change?
- Prompt injection or compromise?
- Runaway retry loop?
- Bug in a recently added skill?

Recovery Actions:

Lower spending limits:


openclaw config set moonshot.daily_limit '$5'
openclaw config set anthropic.daily_limit '$10'

Disable problematic skills: If a new skill caused the spike, disable it

Restart with caution:


openclaw start --monitor
# Watch logs for 30 minutes before trusting it

Incident Type 2: Suspected Compromise or Prompt Injection

Symptoms:

Agent ignoring SOUL.md boundaries
Unexpected attempts to access blocked tools
Sudden changes in behavior after a user message
Agent trying to exfiltrate data or credentials

Immediate Actions (within 2 minutes):

STOP the gateway immediately: openclaw stop
Do NOT acknowledge the compromise to users yet — investigate first

Take a snapshot of logs and state:


# Preserve evidence
mkdir -p ~/incident-$(date +%s)
cp -r ~/.openclaw/logs ~/incident-*/
cp ~/.openclaw/config.yml ~/incident-*/
docker ps -a > ~/incident-*/containers.txt

Investigation (next 30-60 minutes):

Review the suspicious message:
- Does it contain prompt injection patterns?
- Was it sent by a trusted user?
- What was the agent's response?

Check execution logs:


# What tools did the agent try to use?
grep "tool_execution" ~/.openclaw/logs/session.log

# What tool policies were violated?
grep "policy_violation" ~/.openclaw/logs/session.log

Determine severity:
- Low: Agent attempted blocked tool but was stopped by policy
- Medium: Agent executed blocked tool, but limited data access
- High: Agent exfiltrated credentials or sensitive data

Recovery Actions:

Revoke credentials (for HIGH severity only):


# Revoke compromised credentials
- API keys (Moonshot, Anthropic, OpenAI)
- Telegram bot token
- Matrix access tokens
- OAuth tokens

Tighten SOUL.md boundaries: Add explicit rules about injection attempts
Tighten tool policies: Reduce allowed tools or hosts
Remove suspicious skills: If a skill was the vector, remove it

Restart and monitor:


openclaw start --monitor
# Watch for 2-3 hours of normal operation before trusting it

Incident Type 3: Erratic Behavior (Agent Acting Weird)

Symptoms:

Agent responses are incoherent or contradictory
Agent forgets context mid-conversation
Agent makes logical errors it normally doesn't
Agent is slower than usual

Immediate Actions:

Check system resources:


# Is the machine running low on CPU, memory, or disk?
top
df -h
iostat

Check gateway logs:


openclaw logs --filter error | head -20

Check model connectivity:


# Are API calls succeeding?
grep "api_error|timeout|rate_limit" ~/.openclaw/logs/*.log

Common Causes & Solutions:

Model is down/rate-limited: Check provider status page, wait 1-5 minutes
Low disk space: Clean up logs, restart
Memory leak in agent: Restart the gateway
Network connectivity issue: Check firewall, routing, DNS

Incident Type 4: Gateway Won't Start

Symptoms:

openclaw start hangs or immediately exits
Port already in use
Config file errors

Troubleshooting:

Check logs:


openclaw logs --follow
# Look for specific error messages

Verify config syntax:
```
openclaw validate-config
```

Check port availability:


lsof -i :3000
# Kill the process if it's stuck
kill -9 <PID>

Try a clean start:


openclaw reset  # Warning: clears session history
openclaw start

Incident Severity Matrix

Severity	Indicator	Response Time	Action
Critical	Credentials exfiltrated, agent uncontrollable	Immediate (< 2 min)	STOP gateway, revoke credentials, investigate
High	Successful tool policy violation, injection attempt executed	5-10 minutes	STOP, investigate, tighten policies, restart
Medium	API bill spike, erratic behavior, failed injection attempts	15-30 minutes	Investigate root cause, lower limits, monitor
Low	Minor errors, degraded performance	1+ hour	Monitor, diagnose, plan remediation

Post-Incident Checklist

After any incident, follow this checklist:

Document what happened: Timeline, root cause, impact
Update SOUL.md and tool policies: Prevent the same incident
Review security posture: Was defense-in-depth effective?
Improve monitoring: Would earlier detection have helped?
Test fixes: Verify changes prevent the issue
Share learnings: Communicate to your team

Key Takeaways

Speed matters — Stop the gateway before damage scales
Preserve evidence — Save logs and state before restarting
Investigate root cause — Don't just restart and hope
Tighten controls after incidents — Use each incident to improve defenses
Test your playbook — Run drills quarterly so you know what to do

Resources & Automation

Consider automating incident response:

Alerting: Set up alerts for bill spikes, tool policy violations, and error rates
Auto-mitigation: Automatically lower spending limits if bill spikes
Incident recording: Automatically save logs when incidents occur

OpenClaw Emergency Response Playbook: Incident Response for Compromise, High Bills, and Erratic Behavior

Why You Need an Incident Response Plan

Incident Response Playbook

Incident Type 1: Unexpected API Bill Spike

Incident Type 2: Suspected Compromise or Prompt Injection

Incident Type 3: Erratic Behavior (Agent Acting Weird)

Incident Type 4: Gateway Won't Start

Incident Severity Matrix

Post-Incident Checklist

Key Takeaways

Resources & Automation

Related Services

Related Articles

OpenClaw Security Hardening: The Complete Checklist

OpenClaw Cost Optimization: Kimi K2.5 Primary + Claude Sonnet Fallback Strategy

OpenClaw Model Routing Strategies: Kimi K2.5 Primary + Fallback Configuration