OpenClaw works with dozens of LLM providers — Claude, GPT-4, Gemini, Llama, Mistral, and more. Each model has different strengths: some excel at coding, others at analysis, some are blazing fast, others are dirt cheap. Choosing the wrong model means you either overpay for capabilities you don't need, or under-deliver on quality for tasks that demand precision. This guide helps you match models to your actual needs.
Why This Is Hard to Do Yourself
These are the common pitfalls that trip people up.
Too many model options
Anthropic has 4+ Claude models, OpenAI has 6+ GPT variants, Google has Gemini Pro/Flash/Ultra, plus dozens of open-source options. How do you choose?
Cost vs quality tradeoff
The best model (Claude Opus, GPT-4) costs 10-20x more than budget models (Haiku, GPT-3.5). Is the quality difference worth it for your use case?
Different models for different tasks
No single model is best for everything. You need fast cheap models for simple queries, powerful expensive models for complex reasoning, and specialized models for code or analysis.
Step-by-Step Guide
Understand what models OpenClaw supports
OpenClaw integrates with all major LLM providers.
# Supported providers (via OpenClaw config):
providers:
- anthropic # Claude 3.5 Sonnet, Opus, Haiku
- openai # GPT-4o, GPT-4 Turbo, GPT-3.5
- google # Gemini 1.5 Pro, Flash, Ultra
- azure-openai # Azure-hosted GPT models
- aws-bedrock # Claude, Llama, Mistral via AWS
- ollama # Local open-source models
- openrouter # Access to 100+ models via one API
# Example multi-provider config:
# config/models.yaml
models:
primary: anthropic/claude-3-5-sonnet-20241022
fallback: openai/gpt-4o
budget: anthropic/claude-3-haiku-20240307
local: ollama/llama3.1:8bWarning: Start with one provider (Anthropic or OpenAI) to keep things simple, then add others as you identify specific needs for cheaper or specialized models.
Compare model capabilities
Know what each model is actually good at.
# Model capability comparison (early 2025):
# Top tier (best quality, highest cost):
Claude Opus 4.5 ⭐⭐⭐⭐⭐ Best overall reasoning, analysis, writing
GPT-4o ⭐⭐⭐⭐⭐ Best for structured output, tool use, speed
Gemini 1.5 Pro ⭐⭐⭐⭐ Great for long context (1M+ tokens)
# Mid tier (balanced cost/quality):
Claude Sonnet 3.5 ⭐⭐⭐⭐ Best coding, great all-around, fast
GPT-4 Turbo ⭐⭐⭐⭐ Solid reasoning, good for most tasks
Gemini 1.5 Flash ⭐⭐⭐ Very fast, good for simple tasks
# Budget tier (cheapest):
Claude Haiku ⭐⭐⭐ Fast, cheap, good for simple queries
GPT-3.5 Turbo ⭐⭐ Cheapest, but outdated capabilities
# Specialized:
Codestral (Mistral) ⭐⭐⭐⭐ Code generation specialist
Llama 3.1 (local) ⭐⭐⭐ Free (self-hosted), privacy-focused
# Use case recommendations:
- Code review, generation: Claude Sonnet, Codestral
- Data analysis, research: Claude Opus, Gemini Pro
- Customer support (simple): Claude Haiku, Gemini Flash
- Complex reasoning: Claude Opus, GPT-4o
- Budget-conscious: Claude Haiku, GPT-3.5
- Privacy-critical: Llama 3.1 (local)Warning: Don't default to the most expensive model for everything. Most queries don't need Opus-level reasoning and work fine on Sonnet or Haiku.
Compare pricing tiers
Know what you're paying per query.
# Pricing comparison (per 1M tokens, early 2025):
# Anthropic:
Claude Opus 4.5: $15 input / $75 output (most expensive)
Claude Sonnet 3.5: $3 input / $15 output (best value)
Claude Haiku: $0.25 input / $1.25 output (cheapest)
# OpenAI:
GPT-4o: $2.50 input / $10 output
GPT-4 Turbo: $10 input / $30 output
GPT-3.5 Turbo: $0.50 input / $1.50 output
# Google:
Gemini 1.5 Pro: $1.25 input / $5 output
Gemini 1.5 Flash: $0.075 input / $0.30 output (very cheap)
# Example cost per 1,000 conversations (5k input, 1.5k output each):
Claude Opus: $127.50 (premium quality)
Claude Sonnet: $37.50 (best balance)
Claude Haiku: $3.13 (budget option)
GPT-4o: $27.50
Gemini Flash: $0.83 (ultra-budget)
# Cost difference matters at scale:
# 100,000 conversations/month:
# - Gemini Flash: $83/month
# - Claude Sonnet: $3,750/month
# - Claude Opus: $12,750/monthWarning: For most production use cases, Claude Sonnet offers the best quality-to-cost ratio. Reserve Opus for the 10% of queries that truly need it.
Match models to use cases
Route different tasks to different models.
# Smart model routing strategy:
# Use Claude Opus / GPT-4o for:
- Complex multi-step reasoning
- Critical business decisions
- Legal/financial analysis
- Creative writing (marketing copy, etc.)
- Research synthesis from multiple sources
# Use Claude Sonnet / GPT-4 Turbo for:
- Code generation and review
- General Q&A and assistance
- Document summarization
- Data transformation
- Most everyday tasks (80% of queries)
# Use Claude Haiku / Gemini Flash for:
- Simple factual queries
- Classification tasks
- Sentiment analysis
- Routing/triage (which expert to ask?)
- High-volume automated tasks
# Use local models (Llama, Mistral) for:
- Privacy-sensitive data (medical, legal, etc.)
- Offline/air-gapped environments
- Extremely high volume (millions of queries)
- When API costs would exceed GPU hosting costs
# Implementation in OpenClaw:
# config/routing.yaml
routing:
rules:
- if: query.complexity == "high"
use: anthropic/claude-opus-4-5
- if: query.type == "code"
use: anthropic/claude-3-5-sonnet
- if: query.type == "simple_qa"
use: anthropic/claude-3-haiku
- default: anthropic/claude-3-5-sonnetWarning: Over-routing to expensive models wastes money. Under-routing to cheap models frustrates users with poor quality. Start conservative (use Sonnet for most things) and optimize over time.
Configure model routing
Set up automatic model selection.
# OpenClaw model routing config:
# config/models.yaml
models:
# Primary model for most queries:
default: anthropic/claude-3-5-sonnet-20241022
# Route by query type:
routing:
by_complexity:
simple:
model: anthropic/claude-3-haiku-20240307
triggers:
- query_length < 100 tokens
- query_type in ["factual", "simple_qa"]
complex:
model: anthropic/claude-opus-4-5
triggers:
- query_length > 2000 tokens
- requires_reasoning == true
- user_tier == "premium"
by_task_type:
code:
model: anthropic/claude-3-5-sonnet-20241022
analysis:
model: anthropic/claude-opus-4-5
chat:
model: anthropic/claude-3-haiku-20240307
# Fallback chain (if primary fails):
fallbacks:
- openai/gpt-4o
- google/gemini-1.5-proWarning: Monitor which routes are actually being used. If "simple" route is rarely triggered, your triggers are too conservative — loosen them to save money.
Monitor and adjust
Track quality vs cost and optimize.
# Metrics to track per model:
# 1. Cost per query:
# - Average tokens consumed
# - Average $ spent
# - Total monthly spend by model
# 2. Quality metrics:
# - User satisfaction (thumbs up/down)
# - Task completion rate
# - Error rate / retry rate
# 3. Performance:
# - Average latency (time to first token)
# - Throughput (queries/second)
# Example monitoring dashboard queries:
# "Which model has best cost-per-successful-query?"
# "Are Haiku queries being retried with Sonnet more than 20%?" (sign Haiku is underperforming)
# "What % of queries actually need Opus vs Sonnet?" (am I over-routing?)
# Optimization loop (monthly):
# 1. Review past month's usage by model
# 2. Identify expensive queries (high token count, complex routing)
# 3. Test if cheaper model produces acceptable results
# 4. Adjust routing rules
# 5. Measure impact on cost and quality
# Example adjustment:
# Before: 60% of queries routed to Sonnet, 40% to Opus
# After analysis: 80% could use Sonnet with no quality loss
# New routing: 80% Sonnet, 15% Opus, 5% Haiku
# Result: 35% cost reduction, same user satisfactionWarning: Don't optimize purely for cost — users will notice quality degradation. Aim for lowest cost that maintains acceptable quality for each use case.
Choosing Between Models?
Generic benchmarks don't reflect your actual workload. Our experts benchmark Claude, GPT-4, Gemini, and open-source models against your real queries, measure quality-vs-cost tradeoffs for your specific use cases, and recommend the optimal model mix. We'll configure routing rules and monitor performance to ensure you're getting the best value.
Get matched with a specialist who can help.
Sign Up for Expert Help →