📉Cost Optimization

How to Choose the Right LLM for OpenClaw

Intermediate20 minutesUpdated 2025-03-01

OpenClaw works with dozens of LLM providers — Claude, GPT-4, Gemini, Llama, Mistral, and more. Each model has different strengths: some excel at coding, others at analysis, some are blazing fast, others are dirt cheap. Choosing the wrong model means you either overpay for capabilities you don't need, or under-deliver on quality for tasks that demand precision. This guide helps you match models to your actual needs.

Why This Is Hard to Do Yourself

These are the common pitfalls that trip people up.

🤯

Too many model options

Anthropic has 4+ Claude models, OpenAI has 6+ GPT variants, Google has Gemini Pro/Flash/Ultra, plus dozens of open-source options. How do you choose?

⚖️

Cost vs quality tradeoff

The best model (Claude Opus, GPT-4) costs 10-20x more than budget models (Haiku, GPT-3.5). Is the quality difference worth it for your use case?

🎯

Different models for different tasks

No single model is best for everything. You need fast cheap models for simple queries, powerful expensive models for complex reasoning, and specialized models for code or analysis.

Step-by-Step Guide

Step 1

Understand what models OpenClaw supports

OpenClaw integrates with all major LLM providers.

# Supported providers (via OpenClaw config):
providers:
  - anthropic        # Claude 3.5 Sonnet, Opus, Haiku
  - openai           # GPT-4o, GPT-4 Turbo, GPT-3.5
  - google           # Gemini 1.5 Pro, Flash, Ultra
  - azure-openai     # Azure-hosted GPT models
  - aws-bedrock      # Claude, Llama, Mistral via AWS
  - ollama           # Local open-source models
  - openrouter       # Access to 100+ models via one API

# Example multi-provider config:
# config/models.yaml
models:
  primary: anthropic/claude-3-5-sonnet-20241022
  fallback: openai/gpt-4o
  budget: anthropic/claude-3-haiku-20240307
  local: ollama/llama3.1:8b

Warning: Start with one provider (Anthropic or OpenAI) to keep things simple, then add others as you identify specific needs for cheaper or specialized models.

Step 2

Compare model capabilities

Know what each model is actually good at.

# Model capability comparison (early 2025):

# Top tier (best quality, highest cost):
Claude Opus 4.5          ⭐⭐⭐⭐⭐  Best overall reasoning, analysis, writing
GPT-4o                   ⭐⭐⭐⭐⭐  Best for structured output, tool use, speed
Gemini 1.5 Pro           ⭐⭐⭐⭐    Great for long context (1M+ tokens)

# Mid tier (balanced cost/quality):
Claude Sonnet 3.5        ⭐⭐⭐⭐    Best coding, great all-around, fast
GPT-4 Turbo              ⭐⭐⭐⭐    Solid reasoning, good for most tasks
Gemini 1.5 Flash         ⭐⭐⭐      Very fast, good for simple tasks

# Budget tier (cheapest):
Claude Haiku             ⭐⭐⭐      Fast, cheap, good for simple queries
GPT-3.5 Turbo            ⭐⭐        Cheapest, but outdated capabilities

# Specialized:
Codestral (Mistral)      ⭐⭐⭐⭐    Code generation specialist
Llama 3.1 (local)        ⭐⭐⭐      Free (self-hosted), privacy-focused

# Use case recommendations:
- Code review, generation: Claude Sonnet, Codestral
- Data analysis, research: Claude Opus, Gemini Pro
- Customer support (simple): Claude Haiku, Gemini Flash
- Complex reasoning: Claude Opus, GPT-4o
- Budget-conscious: Claude Haiku, GPT-3.5
- Privacy-critical: Llama 3.1 (local)

Warning: Don't default to the most expensive model for everything. Most queries don't need Opus-level reasoning and work fine on Sonnet or Haiku.

Step 3

Compare pricing tiers

Know what you're paying per query.

# Pricing comparison (per 1M tokens, early 2025):

# Anthropic:
Claude Opus 4.5:     $15 input / $75 output   (most expensive)
Claude Sonnet 3.5:   $3 input / $15 output    (best value)
Claude Haiku:        $0.25 input / $1.25 output (cheapest)

# OpenAI:
GPT-4o:              $2.50 input / $10 output
GPT-4 Turbo:         $10 input / $30 output
GPT-3.5 Turbo:       $0.50 input / $1.50 output

# Google:
Gemini 1.5 Pro:      $1.25 input / $5 output
Gemini 1.5 Flash:    $0.075 input / $0.30 output (very cheap)

# Example cost per 1,000 conversations (5k input, 1.5k output each):
Claude Opus:   $127.50  (premium quality)
Claude Sonnet: $37.50   (best balance)
Claude Haiku:  $3.13    (budget option)
GPT-4o:        $27.50
Gemini Flash:  $0.83    (ultra-budget)

# Cost difference matters at scale:
# 100,000 conversations/month:
# - Gemini Flash: $83/month
# - Claude Sonnet: $3,750/month
# - Claude Opus: $12,750/month

Warning: For most production use cases, Claude Sonnet offers the best quality-to-cost ratio. Reserve Opus for the 10% of queries that truly need it.

Step 4

Match models to use cases

Route different tasks to different models.

# Smart model routing strategy:

# Use Claude Opus / GPT-4o for:
- Complex multi-step reasoning
- Critical business decisions
- Legal/financial analysis
- Creative writing (marketing copy, etc.)
- Research synthesis from multiple sources

# Use Claude Sonnet / GPT-4 Turbo for:
- Code generation and review
- General Q&A and assistance
- Document summarization
- Data transformation
- Most everyday tasks (80% of queries)

# Use Claude Haiku / Gemini Flash for:
- Simple factual queries
- Classification tasks
- Sentiment analysis
- Routing/triage (which expert to ask?)
- High-volume automated tasks

# Use local models (Llama, Mistral) for:
- Privacy-sensitive data (medical, legal, etc.)
- Offline/air-gapped environments
- Extremely high volume (millions of queries)
- When API costs would exceed GPU hosting costs

# Implementation in OpenClaw:
# config/routing.yaml
routing:
  rules:
    - if: query.complexity == "high"
      use: anthropic/claude-opus-4-5
    - if: query.type == "code"
      use: anthropic/claude-3-5-sonnet
    - if: query.type == "simple_qa"
      use: anthropic/claude-3-haiku
    - default: anthropic/claude-3-5-sonnet

Warning: Over-routing to expensive models wastes money. Under-routing to cheap models frustrates users with poor quality. Start conservative (use Sonnet for most things) and optimize over time.

Step 5

Configure model routing

Set up automatic model selection.

# OpenClaw model routing config:
# config/models.yaml

models:
  # Primary model for most queries:
  default: anthropic/claude-3-5-sonnet-20241022

  # Route by query type:
  routing:
    by_complexity:
      simple:
        model: anthropic/claude-3-haiku-20240307
        triggers:
          - query_length < 100 tokens
          - query_type in ["factual", "simple_qa"]

      complex:
        model: anthropic/claude-opus-4-5
        triggers:
          - query_length > 2000 tokens
          - requires_reasoning == true
          - user_tier == "premium"

    by_task_type:
      code:
        model: anthropic/claude-3-5-sonnet-20241022

      analysis:
        model: anthropic/claude-opus-4-5

      chat:
        model: anthropic/claude-3-haiku-20240307

  # Fallback chain (if primary fails):
  fallbacks:
    - openai/gpt-4o
    - google/gemini-1.5-pro

Warning: Monitor which routes are actually being used. If "simple" route is rarely triggered, your triggers are too conservative — loosen them to save money.

Step 6

Monitor and adjust

Track quality vs cost and optimize.

# Metrics to track per model:

# 1. Cost per query:
# - Average tokens consumed
# - Average $ spent
# - Total monthly spend by model

# 2. Quality metrics:
# - User satisfaction (thumbs up/down)
# - Task completion rate
# - Error rate / retry rate

# 3. Performance:
# - Average latency (time to first token)
# - Throughput (queries/second)

# Example monitoring dashboard queries:
# "Which model has best cost-per-successful-query?"
# "Are Haiku queries being retried with Sonnet more than 20%?" (sign Haiku is underperforming)
# "What % of queries actually need Opus vs Sonnet?" (am I over-routing?)

# Optimization loop (monthly):
# 1. Review past month's usage by model
# 2. Identify expensive queries (high token count, complex routing)
# 3. Test if cheaper model produces acceptable results
# 4. Adjust routing rules
# 5. Measure impact on cost and quality

# Example adjustment:
# Before: 60% of queries routed to Sonnet, 40% to Opus
# After analysis: 80% could use Sonnet with no quality loss
# New routing: 80% Sonnet, 15% Opus, 5% Haiku
# Result: 35% cost reduction, same user satisfaction

Warning: Don't optimize purely for cost — users will notice quality degradation. Aim for lowest cost that maintains acceptable quality for each use case.

Choosing Between Models?

Generic benchmarks don't reflect your actual workload. Our experts benchmark Claude, GPT-4, Gemini, and open-source models against your real queries, measure quality-vs-cost tradeoffs for your specific use cases, and recommend the optimal model mix. We'll configure routing rules and monitor performance to ensure you're getting the best value.

Get matched with a specialist who can help.

Sign Up for Expert Help →

Frequently Asked Questions