AI Agents Economics

What an Agent-Hour Actually Costs

The claim going around says Fable 5 bills $43/hour, more than the average developer. I metered ten weeks of my own AI usage to check: $37,000 of work that would have cost $210,000 done badly, and $109,000 on the wrong model. The bill doesn't measure the model. It measures the operator.

12 min read

TL;DR: A claim is going around that Anthropic’s Fable 5 “officially charges $43/hour” and now out-bills the average software developer. There is no hourly price - AI models bill per token, like a utility meters kilowatt-hours. I pulled ten weeks of my own usage data to check what an agent-hour really costs me: between $8 and $35, and the 4x spread had nothing to do with the model’s rate card. Caching cut my quarter’s bill from $210,000 to $37,000. Routing work to the right model for each task would have been the difference between $37,000 and $109,000. The expensive AI agent is mostly a badly run one.


An hour of frontier AI now bills like a consultant. That sentence is doing the rounds, usually dressed up with a number: Fable 5, Anthropic’s most capable model, “officially charges $43/hour” - more than the average software developer, the equivalent of a McKinsey partner doing your filing. And, the argument goes, AI pricing is splitting K-shaped: top models for the rich, scraps for everyone else.

In March I wrote about what 1,900 AI sessions in 76 days does to your brain. This is the companion question: what does it do to your wallet? The brain-fry piece measured the volume of working this way. This one prices it - because my tooling logs every token, I can put a real number on claims that are usually just vibes with a dollar sign.

First, the boring correction. There is no $43/hour. Anthropic bills per token, the way a utility bills per kilowatt-hour. A token is roughly three-quarters of a word, and you pay for what the model reads and what it writes - on Fable 5, $10 per million tokens read and $50 per million written. The only actual hourly prices on the entire platform are 8 cents an hour for hosted agent runtime and 5 cents an hour for a code sandbox. The $43 figure is somebody’s estimate of a hard-working agent, dressed up as a price list.

But here’s the thing: the estimate isn’t crazy. My own worst week ran $35 per agent-hour. The claim has roughly the right magnitude and exactly the wrong lesson.

A Quarter on the Meter

From April 1 to June 11 - ten weeks - my agents wrote 200 million tokens of output and read 46 billion tokens of context. That’s roughly 150 million words written (about 25 novels a day) and the equivalent of a few hundred thousand books read. The reading number is the one to stare at: agents re-read constantly. The project files, the conversation so far, their instructions - over and over, every working minute.

At Anthropic’s public API prices, those ten weeks meter out at about $37,000.

The same work, run naively, would have metered at about $210,000.

The difference is one mechanism: prompt caching. Caching means the model remembers what it just read instead of being re-billed full price to read it again, the way a colleague remembers yesterday’s briefing instead of needing it repeated every morning. Cached reading costs a tenth of fresh reading, and 46 of my 48 billion input tokens came out of cache. That single mechanism did about $170,000 of silent work in ten weeks.

(And I paid neither number. Like most heavy users, I’m on a flat subscription. The metered figures are what the work would cost at list prices - useful as a measure, not a bill. For continuity: the 76-day period I covered in the brain-fry piece meters out at an estimated $9,000 the same way. The volume has roughly quadrupled since. My brain is fine. Thanks for asking.)

So when a post claims an agent burns $43/hour, the first question isn’t whether the model is overpriced. It’s whether anyone set up the cache.

What $43 an Hour Would Actually Take

Run the arithmetic backwards and the claim gets stranger. At $50 per million output tokens, billing $43 in one hour on writing alone means producing about 860,000 tokens - roughly 600,000 words. The agent would have to write six novels an hour, every hour, flat out.

Nobody’s agent is doing that. The realistic way to hit $43/hour is on the reading side: about 4.3 million tokens of fresh, uncached input per hour. And that only happens one way - feeding the model the same sprawling, messy context over and over without caching, letting it re-read the equivalent of a thirty-book shelf every hour because nobody scoped the task.

Which is the part the critics accidentally get right. Garbage in is expensive. But it’s expensive in tokens, not hours, and unlike the consultant, a cleaner brief actually brings the bill down.

The Hour Is the Wrong Unit

Even taking the hourly frame at face value, it cuts the other way.

A consultant’s billed hour includes the travel, the lunch, the twenty minutes of re-finding where they left off in the work package. The agent’s meter only runs while tokens are flowing. There is no idle time on the invoice, because idle time doesn’t generate tokens.

And agent-hours stack. In my biggest week this quarter, my agents logged 512 working hours - twelve and a half full-time weeks - inside 61 hours of wall-clock time, because sessions run in parallel and keep working overnight. That week cost $8 per agent-hour, my cheapest of the quarter. No human hourly rate survives that comparison, in either direction - which is exactly why the comparison is broken. The honest unit isn’t the hour. It’s the deliverable.

If I’d Run It All on One Model

Here’s the experiment the hot takes skip. Take my actual ten weeks of token volumes and re-price them as if every task had run on a single model:

ScenarioWith cachingWithout caching
My actual mix$37,000$210,000
Everything on Fable 5$109,000$639,000
Everything on Opus 4.8$42,000$246,000
Everything on Sonnet 4.6$25,000$148,000
Everything on Haiku 4.5$8,400$49,000

Two things jump out.

The spread between the corners is 76x. Same work, same token volumes: $8,400 if everything ran cached on the cheapest model, $639,000 if everything ran uncached on the most expensive one. The model rate card explains 13x of that. Operation - caching and routing - explains the rest.

And my actual mix beats running everything on Opus, the model that did most of my heavy lifting. That’s the sub-agent effect: less than 1% of my output tokens came from Fable 5, the expensive model everyone is posting about. The mechanical work - searching files, scanning documents, summarizing - runs on models that cost a tenth as much, delegated automatically by the orchestrating session. The frontier model gets what only the frontier model can do: the hardest reasoning and planning. Running every task through the biggest model is hiring the McKinsey partner to do the photocopying - and then complaining about his rate.

(The Haiku row is illustrative, not a strategy: the cheap model couldn’t actually do the hardest work in my queue, and the Fable row charges Fable’s heavier token accounting on top of its price. The point isn’t that you should run everything on the cheapest model. It’s that “what does AI cost per hour” has no answer until you know how the work is routed.)

The Three Levers

So the meter says where the money actually goes. Three levers, in order of force.

Caching. $37,000 versus $210,000 for identical work. This is infrastructure, not effort - set up once, saves continuously. Most of the horror-story bills floating around are this lever, left unpulled.

Model mix. $37,000 versus $109,000 for the same work pushed through the biggest model. The fix isn’t discipline so much as delegation: let the orchestrating agent hand search and grunt work to cheap models and keep the frontier model for the calls that need it.

Clean briefs. My weekly cost per agent-hour ranged from $8 to $35 this quarter. Same models, same kind of work. The expensive weeks were the sloppy ones: vaguely scoped tasks, contexts left to go cold and re-read from scratch. The cheap weeks were the most parallel and most precisely briefed ones. A 4x swing, entirely self-inflicted.

Notice what’s not on the list: the model’s price.

About That K-Shape

The other claim going around - that AI pricing will split into top models for the rich and scraps for the poor, and only get worse - has the price history exactly backwards.

Claude 3 Opus, the frontier model of March 2024, cost $15 per million tokens read and $75 per million written. Fable 5, the most capable model money can buy today, costs $10 and $50. The top of the K is cheaper than the top of the K was two years ago. Meanwhile the cheapest current model costs $1 and $5 - and outperforms that 2024 frontier model at fifteen times less. Yesterday’s best-money-can-buy is today’s commodity tier. It’s also the tier my sub-agents run on, which is exactly how a falling floor turns into a falling bill.

What did explode is what you can spend. Agentic loops, parallel sessions, models that think harder when you ask them to - the ceiling went from “a chat subscription” to “unbounded.” That’s the kernel of truth in the K-shape. But it’s a divergence in spend, not in access. Everyone can buy the same tokens at the same falling prices.

If a K-shape is forming, it isn’t in the price list. It’s in operator skill: the gap between teams that run agents with warm caches, the right model for each task, and well-scoped briefs - and teams that paste a vague request into the most expensive model and screenshot the bill. That’s the same gap I wrote about in the brain-fry piece. There it showed up as fatigue versus flow. Here it shows up as a 76x cost spread. Same cause both times: fluency compounds, and nobody budgets time to build it.

A Live Test: a $3.29 Engine and an $18.38 One

The reprice above is arithmetic on my own logs. Two weeks later I ran the real thing - the same feature, built twice, on two different stacks.

The task was self-contained: a padel tournament’s handicap-scoring engine, the kind of pure logic that lives or dies on a unit test. I specced it once with Spec Kit, then handed the identical spec to two builders. One was Claude in my normal setup. The other was GLM-5.2 - a capable open-weight model out of China, the falling floor from the last section made concrete - running through the Goose CLI on a completely fresh machine: no memory, no plugins, no house rules.

The scoreboard:

GLM-5.2 (fresh)Claude (my rig)
Billed cost$3.29$18.38
Wall-clock17 min20 min
Own test suitefails one of its own tests38 of 38 pass
Edge cases correctnoyes

GLM was 5.6x cheaper and a touch faster. It also shipped broken: its handicap formula returns zero points for the biggest underdog in the field - the exact player a handicap exists to lift - and its match generator failed a test GLM itself had written minutes earlier. Cheaper per token, cheaper per task, and wrong on the one thing the feature was for. (In fairness, GLM got a thinner brief than Claude did; some of the gap is the brief, not the model. But a model flunking its own test isn’t a brief problem - and whether the rest of it was the model or the cheap harness around it turned out to be its own question, which I take apart in The Harness or the Model?.)

So far, on-message: the cheap option hid its cost in the debugging you’d do later. The more interesting number is the $18.38, because almost none of it was the work.

Where the $18.38 wentCost
Re-reading my standing context, every turn~$12.25
Writing the actual code~$3.73
Loading that context into cache~$1.45
Everything else~$0.10

The code - the thing I wanted - cost $3.73, about the same as GLM’s entire bill. The other fourteen dollars was context handling, and two thirds of it was a single line item: re-reading, on every turn, a standing environment the task never touched. My setup carries around 150 tool definitions, seventy-odd skills, a memory briefing, and the machinery to spawn sub-agents - and 87% of this run came from sub-agent-heavy work, where each helper reloads much of that payload again. GLM ran naked and paid none of it.

Strip my rig down to what this one task needed - no plugins, no memory, no sub-agents - and Claude’s cost falls toward GLM’s three dollars. The 5.6x gap was mostly my operating environment, not the model’s price and not its intelligence.

That’s not a case for running naked. The same machinery that cost me twelve dollars in re-reads is what lets the orchestrator hand grunt work to models a tenth the price - the sub-agent effect that made my actual quarter cheaper than running everything on Opus. It’s overhead you pay on every task whether it needs it or not, and a one-shot engine build is exactly the task that doesn’t. The fix isn’t to tear out the rig. It’s knowing when to leave it at home.

Three runs at one lesson, from three sides: GLM’s low price hid a correctness cost, Claude’s high price was inflated by how I run rather than what I ran, and most of that price was re-reading an environment this job never used. Which brings me back to where the meter kept pointing all along.

The Bill Measures the Operator

The viral math isn’t entirely wrong - a frontier agent run carelessly really can out-bill a developer. My own worst week proves the magnitude is real. But the conclusion is backwards. The cost of an agent-hour isn’t a fact about the model. It’s a readout of how well the operation around it is run: what gets cached, what gets delegated down-tier, and how clearly the work is specified before the meter starts.

The same was true of every expensive resource before this one. Nobody concluded cloud computing was a scam because someone left a server cluster running over the weekend.

So when the next post tells you what AI “charges per hour,” ask the question the meter actually answers: not “what does the model cost?” but “what does this team’s way of working cost?” Those are different numbers. Mine differ by 5x on caching alone, and 76x corner to corner. The gap is the job.


Method notes: usage figures are pulled from my own Claude Code session logs, including sub-agent sessions, April 1 to June 11, 2026. May 12 onward is measured directly; April 1 to May 11 token categories are estimated from daily per-model output counts scaled by ratios measured in the directly-logged window. Prices are Anthropic’s public API list rates ($10/$50 per million tokens for Fable 5, $5/$25 for Opus 4.8 and 4.7, $3/$15 for Sonnet 4.6, $1/$5 for Haiku 4.5; cached reads at one-tenth of input price, cache writes at 1.25x). The all-Fable scenario includes the ~30% higher token counts of Fable’s tokenizer; the all-Haiku scenario ignores its smaller context window. Historical pricing for Claude 3 Opus ($15/$75, March 2024) per Anthropic’s published pricing. The live head-to-head (June 24, 2026) built the same Spec Kit-specified handicap-scoring engine twice - GLM-5.2 via the Goose CLI on a clean machine, Claude via Claude Code in my normal setup; reported costs are each provider’s own billing (z.ai account balance for GLM, Claude Code’s session meter for Claude), and the $18.38 breakdown applies the list rates above to Claude Code’s reported token counts (20.9k input, 149.1k output, 24.5M cache reads, 232k cache writes). What those runs reveal about the model versus the harness around it - and whether you can trust an agent’s “it’s done” - is the companion piece, The Harness or the Model?. The other companion, on working at this volume without frying your brain, is here.