Anthropic ships three Claude models at very different prices. In a small company one person picks one and moves on. In a mid-market company, a hundred to a thousand staff, you do not have one person picking. You have marketing, support, engineering, sales ops, and analytics each running their own AI workloads, each defaulting to whatever model they tried first, and a finance function that gets a single invoice at month end with no idea which team or which task drove it. That is how the AI bill runs away. The fix is not a smarter engineer. It is a written model-selection policy that sets the tiers, the budgets, and the routing once, so no team has to guess and no team can quietly spend the org into a bad number. This is how to write that policy, monitor it, and defend it to finance.
Why mid-market is where model choice becomes a governance problem
The single-user version of this decision is a preference. The mid-market version is a control problem. Three things change at your size. First, the number of independent AI adopters. When five teams run AI, they will make five different model choices, and at least three of them will default to the most expensive model because it is the one they read about. Second, the money is real. Ten teams each running thousands of calls a day turns a rounding error into a line item finance asks about. Third, nobody owns the aggregate. Each team owns its own workload and sees its own output quality, but no single person sees the total spend against the total value, so the number drifts up and no one is accountable for it.
The underlying model framework is the same one any operator uses: the most capable model for hard reasoning, the daily-driver for most work, the fast cheap model for high-volume simple tasks, and roughly a fivefold to nineteenfold cost gap between the top and bottom tiers. The full per-workload logic lives in the model-selection guide. What changes for you is that you cannot apply it as a judgment call per engineer. At your scale a judgment call is a policy gap. You have to write the choice down, make it the default, and hold teams to it the same way you hold them to any other spend control.
The written model-selection policy, in four parts
A policy that lives in one person's head is not a policy. Write it as a short document that any team lead, any engineer, and anyone in finance can read in five minutes. Four parts carry the whole thing.
- Tiering standard. A named tier for each model, the workloads that belong in each tier, and the rule for when a workload is allowed to move up a tier.
- Budgets and ownership. A monthly AI budget per team, with each team owning its own line and a single named owner for the org total.
- Chargeback. Spend tagged by team and workload so the invoice splits back to the teams that caused it, instead of landing in one shared bucket nobody defends.
- Routing. A thin technical layer that applies the tier rule automatically, so teams do not each re-decide the model on every call.
The rest of this piece walks each part. The order matters: the tiering standard defines the rule, budgets and chargeback make teams feel their own spend, and routing enforces the rule in code so compliance does not depend on everyone remembering it.
1. The tiering standard
Name three tiers and map every workload class to one. The top tier is for complex reasoning, multi-step analysis, code in unfamiliar systems, and any output that has to be exactly right. The middle tier is the daily driver: drafting, summarization, structured extraction, edits in familiar code, and customer-facing chat that needs polish. The bottom tier is for high volume and simple output: classification, short transformations, fast question-answering over a small context, lead scoring. The standard is not a suggestion. It states the default tier for each workload and a short list of the workloads that are exempt and allowed to run in the top tier.
Write the move-up rule explicitly, because that is where money leaks. A workload only earns the top tier when a blind evaluation shows the cheaper tier fails the quality bar for that specific task. No team gets to declare its own work important enough for the top model by assertion. The evaluation is the receipt. Twenty representative tasks, run through both tiers, scored blind. Most of the time the cheaper tier wins or ties, and the workload stays down a tier. That evaluation is also what you hand finance when they ask why a workload sits where it sits.
2. Budgets and ownership
Give every team a monthly AI budget and a named owner for that line. Give one person, usually in ops or finance, ownership of the org total. This is the move that turns AI spend from an invisible shared cost into a number each team feels. A team that sees its own AI line next to its other costs will right-size its own model choices without being told. A team that only ever sees a shared org invoice has no reason to care what it spends.
- Per-team monthly budget. A real number each team plans against, not a soft target. When a team wants more, it makes the case with its own value math.
- A named owner per team line. One person accountable for that team's AI spend, so there is always someone to ask.
- A single owner of the org total. The person who sees the aggregate against the aggregate value and answers to finance for it.
- A monthly review. Fifteen minutes where the owners look at spend against budget and against the counterfactual of running everything in the top tier. That comparison is the headline number.
3. Chargeback: split the invoice back to the teams that caused it
The invoice arrives as one number. Chargeback splits it back. Tag every AI call with the team and the workload that made it, roll the tags up monthly, and each team gets its own slice of the bill instead of the cost disappearing into a shared line. Chargeback is what makes the budgets real. A budget with no mechanism to attribute actual spend is a wish. Once a team sees the bill it actually generated, the model-selection policy enforces itself, because now overspending shows up on the team's own line and someone has to explain it.
This is the same discipline the content-heavy version of the problem needs, covered in governing AI content spend at scale. Model selection and caching are the two levers, and both need the same attribution layer underneath them: you cannot control what you cannot see, and at your scale you cannot see it without tagging.
// Tag every Claude call so the invoice splits back by team + workload.
// metadata.user_id is the attribution key rolled up for chargeback.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callClaude(opts: {
team: string; // "support" | "marketing" | "analytics" | ...
workload: string; // "lead-scoring" | "chat" | "blog-draft" | ...
model: string; // resolved by the routing layer, not by the caller
prompt: string;
}) {
const res = await client.messages.create({
model: opts.model,
max_tokens: 1024,
metadata: { user_id: opts.team + ":" + opts.workload },
messages: [{ role: "user", content: opts.prompt }],
});
// res.usage carries input/output tokens: multiply by the tier rate,
// group by metadata.user_id, and the monthly rollup is the chargeback table.
return res;
}4. The routing layer, so teams do not each default to the top model
The tiering standard is a document. The routing layer is the code that applies it, so no engineer re-decides the model on every call and no team can quietly upgrade its whole workload to the top tier. Route by the workload id, not by the caller's preference. The workload name maps to a tier, the tier maps to a model, and the model is resolved centrally. A team asks for "lead scoring" and gets the bottom tier whether or not the engineer who wrote the call knew that was the policy. This is what makes the standard real instead of aspirational.
// One place resolves workload -> tier -> model. Teams pass a workload name,
// never a model. Changing the policy is a one-line edit here, org-wide.
type Tier = "top" | "daily" | "bulk";
const WORKLOAD_TIER: Record<string, Tier> = {
"blog-draft": "top", // voice-critical, exempted by evaluation
"complex-code-review": "top",
"customer-chat": "daily",
"page-draft": "daily",
"lead-scoring": "bulk",
"alt-text": "bulk",
"classification": "bulk",
};
const TIER_MODEL: Record<Tier, string> = {
top: "claude-opus-4-7",
daily: "claude-sonnet-4-6",
bulk: "claude-haiku-4-5-20251001",
};
export function modelFor(workload: string): string {
const tier = WORKLOAD_TIER[workload] ?? "daily"; // unknown work defaults to daily, never top
return TIER_MODEL[tier];
}The routing layer also carries the escalation pattern for workloads that genuinely need it. The cheap tier can triage a request and hand the hard cases up to a more capable tier, so you pay the top rate only on the fraction of traffic that earns it. Support chat is the classic case: the bottom tier reads the message and either answers a simple question or escalates a complex one up a tier. Most messages never leave the bottom tier, and the ones that do are the ones worth the cost.
Monitoring cost-per-workload as an ops metric
A policy you do not measure decays. Make cost-per-workload a standing ops metric, watched on the same cadence as any other operational number. You are watching for two things: the aggregate trending against budget, and any single workload's unit cost drifting up. A workload whose cost-per-call climbs is either getting more traffic, sending longer prompts, or has quietly been bumped to a higher tier by someone who did not follow the standard. All three are things you want to see the week they happen, not at quarter end.
- Total spend against total budget, per team and org-wide. The headline. If a team is over, someone owns the conversation.
- Cost per workload. Spend divided by call count for each workload class. This is the drift detector; a rising unit cost is a policy leak.
- Tier mix. What share of calls ran in each tier this month. If top-tier share is creeping up, a workload jumped its lane without a receipt.
- The counterfactual. What the month would have cost with every workload in the top tier. That gap is the dollar value your policy produced, and it is the number you take to finance.
Vendor and model governance
Model selection is one axis of a larger governance question, and mid-market is where the larger question starts to matter. Two things to write down alongside the tiering standard.
- Model version pinning. Pin the exact model version each tier uses, and review it on a schedule. A model that silently updates under you can change both output and cost. Pin it, test the new version against your evaluation set before you adopt it, then move the whole tier at once. Do not let individual teams float their own versions.
- Batch and off-peak routing. Anthropic's batch processing runs non-realtime work at half the per-token cost. Nightly generation, periodic re-indexing, and scheduled reports do not need a live response, so they belong on the batch path. Make that part of the standard, not a per-team discovery.
- A single procurement relationship. One contract, one billing account, one place the spend is visible, instead of each team spinning up its own API key on a personal card. Fragmented procurement is how spend hides. Consolidate it so the chargeback layer has one source to attribute from.
Keep the vendor's own pricing and model documentation in the loop when you review the standard, because both move. The current tiers and rates live at Anthropic's pricing page, and the model behavior and version details are in the Anthropic documentation. Review both when you re-run your evaluation set, so the policy tracks reality instead of the numbers you wrote down a quarter ago.
Defending the policy to finance and leadership
A cost-control policy that finance does not understand is a policy finance will eventually cut. Bring them the instrument, not the internals. They do not need the model names or the routing code. They need three numbers, reported monthly, in language that maps to how they already think about spend.
- Actual AI spend, split by team via chargeback. This is the number they already wanted and never had. It maps AI cost onto the cost centers they already manage.
- Spend against budget, per team. Green where teams are within budget, flagged where they are over, with the owner named. This is a normal budget conversation now, not a mystery invoice.
- The counterfactual saving. Actual spend next to the everything-in-the-top-tier number. This is the line that reframes the whole program from cost to control.
Leadership cares about a fourth thing: that the policy does not slow teams down or degrade output. Answer it before they ask. The routing layer means no team spends time deciding models. The evaluation-gated move-up rule means a workload only sits in a cheaper tier when it passed the quality bar there, so nobody is shipping worse work to save money. Say that plainly and you close the only real objection to a tiering standard, which is the fear that cost control means worse output. It does not; it means paying the top price only where the top price buys something.
A 30-day rollout for the policy
You do not roll this out with a memo. You roll it out in a month, in the order that makes each step earn the next.
- Week 1: measure the baseline. Tag existing AI calls by team and workload and get one month of real attribution. You cannot set budgets or write a standard without knowing where the money currently goes.
- Week 2: write the tiering standard. Map every current workload to a tier. Run the blind evaluation on the workloads teams claim need the top model. Most will not; those move down a tier and that is your first saving.
- Week 3: ship the routing layer and the chargeback rollup. Route by workload, resolve tier centrally, and produce the first per-team chargeback table. Give each team its own line.
- Week 4: set budgets and stand up the monthly review. Assign a budget and an owner per team, name the owner of the org total, and run the first review with the counterfactual on the slide. That review is the durable thing; the rest was setup.
Where mid-market teams get model governance wrong
- No written default. Without a sanctioned default tier, caution routes everything to the top model and no one is technically breaking a rule. The default has to be written down and it has to be the middle tier.
- Budgets with no chargeback. A per-team budget means nothing if the actual spend never splits back to the team. The attribution layer is what makes the budget bite.
- Letting teams pass their own model. If a caller can name the model, the standard is optional. Route by workload so the tier is resolved centrally and the policy is not a suggestion.
- Measuring total spend but not unit cost. The total can look fine while a single workload's cost-per-call quietly triples. Watch cost-per-workload, not just the sum.
- Treating it as one-and-done. New teams, new workloads, and new model versions arrive constantly. The evaluation set and the standard are living documents reviewed on a cadence, not a project you finish.
Run the policy, or run the platform underneath it
You can build the tagging, the routing layer, the chargeback rollup, and the monitoring yourself. The four-part policy above is the full playbook and it is not exotic engineering. The harder part is the operating discipline: the monthly review, the evaluation gate on tier moves, the version-pinning cadence. That is where a governance policy usually dies, not in the code. That is also where Frontend Horizon's platform layer fits: your teams own their AI work and their output quality, and the platform handles the attribution, the routing, and the cost reporting underneath so finance gets a real chargeback table without your engineers building one. Either way the strategic call stays yours, because knowing which of your workloads genuinely earn the top tier is the part that does not templatize. See where the platform fits across the full solution set.
Questions mid-market teams ask us about model-selection policy
How is this different from the small-company version of the decision?
In a small company one person makes the call and lives with it. In a mid-market company the call is made independently by many teams, the aggregate spend is material, and no single person sees the total. So the answer is not a better judgment call; it is a written standard, a routing layer that enforces it, and chargeback that makes each team feel its own spend. The same three-tier model framework the SME version uses applies; what scales up is the governance around it.
Won't a cheaper default tier hurt output quality?
Only if you skip the evaluation. The move-up rule exists precisely so a workload stays in a cheaper tier only when a blind evaluation shows the cheaper tier meets the quality bar for that task. Where the top tier genuinely wins, the workload is exempted and runs there. You are not trading quality for cost; you are paying the top price only where it buys better output, and paying the daily or bulk price everywhere it does not.
Does chargeback create political friction between teams?
Less than the alternative. A shared invoice with no attribution creates the friction, because when finance asks who spent the money, no one can answer and everyone gets scrutinized. Chargeback ends that: each team owns its own line, and the conversation becomes a normal budget conversation about a team's own number. Teams push back on a mystery bill, not on a bill that reflects work they chose to run.
Model selection is not a per-engineer preference at your scale. It is a spend control, and it needs the same treatment as any other: a written standard, an enforcement layer, attribution back to the teams that cause the cost, and a monthly number you can defend to finance. The underlying model framework is in the model-selection guide, and the same shift retold for other operators is in the agencies, micro business, and SME versions. The content-spend twin, governing the same cost through caching, is governing AI content spend at scale. Current rates and model details are at Anthropic's pricing page and in the Anthropic documentation.
Want a model-selection policy your finance team can actually defend, without building the routing and chargeback layer yourselves? Run the estimator and we will show you the tiering standard, the attribution setup, and the monthly reporting. Or talk to us about the platform layer that runs it underneath your teams.