OpenClaw's default configuration sends every request to Claude Opus, the most expensive model, resulting in unnecessarily high API costs. This guide shows you how to implement model routing, configure token budgets, tune compaction settings, and add automation guardrails to significantly reduce your API costs without sacrificing quality.
Why This Is Hard to Do Yourself
These are the common pitfalls that trip people up.
"Opus for everything"
Default OpenClaw sends every request to Claude Opus, the most expensive model, even for simple tasks
No usage visibility
No built-in dashboard shows which skills or conversations are burning tokens
Compaction costs hidden
Long conversations trigger automatic compaction which uses expensive model calls just to summarize context
Runaway automations
A single misconfigured automation loop can burn through $500+ in tokens overnight
Step-by-Step Guide
Audit your current token usage
Identify where tokens are going.
# Check your API provider dashboard for:
# - Total tokens per day/week/month
# - Breakdown by model (Opus vs Sonnet vs Haiku)
# - Top conversations by token count
# OpenClaw logs token usage per request:
tail -100 ~/.openclaw/logs/tokens.log | sort -t',' -k3 -rn | head -20Set up OpenRouter for model routing
Route simple tasks to cheaper models.
# In .env:
OPENROUTER_API_KEY=sk-or-...
# In config/models.yaml:
routing:
default: anthropic/claude-3.5-sonnet
complex_tasks: anthropic/claude-3-opus
simple_tasks: anthropic/claude-3-haiku
rules:
- pattern: "summarize|format|translate"
model: anthropic/claude-3-haiku
- pattern: "analyze|reason|code-review"
model: anthropic/claude-3-opusConfigure token budgets
Set per-conversation and global limits.
# In config/budgets.yaml:
global:
daily_limit: 500000 # tokens
monthly_limit: 10000000
per_conversation:
max_tokens: 50000
warning_at: 40000
per_skill:
web-scraper: 100000
code-review: 200000Tune compaction settings
Reduce expensive context summarization.
# In config/compaction.yaml:
compaction:
trigger_threshold: 0.8 # Start at 80% context window (default: 0.6)
model: anthropic/claude-3-haiku # Use cheap model for compaction
max_summary_tokens: 2000
preserve_recent_messages: 10Add automation guardrails
Prevent runaway loops.
# In config/automations.yaml:
guardrails:
max_iterations: 50
max_tokens_per_run: 100000
timeout_minutes: 30
require_confirmation_above: 10000 # tokensWarning: Without guardrails, a single automation error can generate thousands of API calls. Always set limits before enabling any automation.
Set up cost alerts
Get notified before bills spike.
# In config/alerts.yaml:
alerts:
- type: daily_spend
threshold: 20.00 # USD
channel: email
- type: hourly_spike
threshold: 500 # % above normal
channel: slack_webhookStop Burning Money on API Calls
Our cost optimization experts configure model routing, budgets, compaction, and monitoring to significantly reduce your API spend.
Get matched with a specialist who can help.
Sign Up for Expert Help โ