Anthropic ships three Claude models: Opus (the most capable), Sonnet (the daily-driver), Haiku (the fast, cheap one). They’re ~5-15x apart in cost per token. Most teams pick one and use it for everything. That’s the wrong move — different workloads have different quality bars, and over-spending on Opus for jobs Haiku could handle eats your AI budget without producing better output. Here’s the per-workload framework.
Current pricing (May 2026)
- Claude Opus 4.7: $15 / $75 per million tokens (input / output).
- Claude Sonnet 4.6: $3 / $15 per million tokens.
- Claude Haiku 4.5: $0.80 / $4 per million tokens.
Opus is ~5x Sonnet and ~19x Haiku on input cost; ~5x Sonnet and ~19x Haiku on output cost. The cost gap is real and matters at scale.
What each model is good at
Opus 4.7: complex reasoning, multi-step problem solving, code generation in unfamiliar codebases, dense analytical work, anything where the answer needs to be exactly right. Sonnet 4.6: most daily work — drafting, summarization, structured extraction, code edits in familiar codebases, customer-facing chat that needs polish. Haiku 4.5: high-volume bulk work — classification, simple transformations, fast Q&A from a small context, lead scoring, embedding-decisioning.
The FH workload-to-model mapping
- Blog post drafts (long-form, voice-sensitive): Opus.
- Location/service page drafts (templated bulk): Sonnet with prompt caching.
- Image alt-text generation: Haiku.
- Lead scoring: Haiku.
- Customer chat (RAG-backed Q&A): Sonnet.
- Code review of complex changes: Opus.
- Boilerplate code generation: Sonnet.
- PR-summary generation: Haiku.
- Image curation rubric scoring: Sonnet (with vision).
- Bulk classification (e.g. ‘is this email spam?’): Haiku.
The decision framework
- Is the output going to a customer or end-user (vs. internal)? If yes, never below Sonnet.
- Does the task require multi-step reasoning across multiple inputs? If yes, Opus.
- Is the task volume high (1000+/day) and the output simple? Haiku.
- Is the task voice-sensitive (FH brand voice, customer support tone)? Sonnet or Opus, depending on stakes.
- Default for everything else: Sonnet.
The cost trap: defaulting to Opus
Most new Claude users default to Opus because ‘the best model.’ That’s 5x the cost for many tasks Sonnet handles equally well. Run a small evaluation: take 20 representative tasks, run them through both Opus and Sonnet, blind-score the outputs. For most tasks Sonnet wins or ties; you save 5x. For the tasks Opus genuinely wins, you keep Opus on them and save on the rest.
Mixing models in a single workflow
Common pattern: Haiku triages the request, Opus answers if Haiku flags it as complex. We use this in customer support chat: Haiku reads the user’s message and either answers from a small FAQ set (cheap, fast) or escalates to a Sonnet/Opus-backed agent (more capable, slower).
async function triage(userMessage: string): Promise<"simple" | "complex"> {
const res = await claude.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 20,
system: "Classify the user message as 'simple' (one-shot answer fits) or 'complex' (needs reasoning).",
messages: [{ role: "user", content: userMessage }],
});
const text = res.content[0].type === "text" ? res.content[0].text : "";
return text.toLowerCase().includes("complex") ? "complex" : "simple";
}
async function answer(userMessage: string) {
const level = await triage(userMessage);
const model = level === "complex" ? "claude-opus-4-7" : "claude-sonnet-4-6";
return await claude.messages.create({ model, max_tokens: 1024, messages: [{ role: "user", content: userMessage }] });
}When Haiku is actually too small
- Tasks needing context beyond ~30k tokens. Haiku’s effective context window is smaller in practice.
- Tasks requiring nuanced understanding of brand voice. Haiku tends generic.
- Tasks with tool use that requires careful reasoning about when to call which tool.
- Anything customer-facing where a slightly-off response damages trust.
When Opus is overkill
- Anything classification-flavored (yes/no, category-A/B/C).
- Anything where the input is structured and the output is short.
- Anything that runs 100+ times per day per user.
Batch API for non-realtime workloads
Anthropic’s Batch API processes requests asynchronously at 50% the per-token cost of the regular API. Use it for: nightly content generation, periodic re-indexing, weekly digest emails, anything that doesn’t need a real-time response. Halving costs on batch workloads compounds quickly.
Measuring the savings
Track per-workload cost in your observability. Tag every Claude call with `workload_id`. Roll up monthly: Opus calls × Opus rate, Sonnet × Sonnet rate, Haiku × Haiku rate. Compare to ‘what if we used Opus for everything’ — that’s your savings number. We track this at FH and report it monthly.
How this lands across FH client work
Every FH AI workflow has an explicit model choice documented. Lead scoring: Haiku. Customer chat: Sonnet (sometimes upgrading to Opus mid-conversation). Image curation: Sonnet (vision). Blog drafts: Opus with prompt caching. Total monthly AI spend across the FH client book sits well under $500 despite running tens of thousands of inference calls. If your AI bill feels too high and you’re running everything on Opus, book a consultation — the model-routing setup is a one-week engagement with immediate compounding savings.