The Line Anthropic Didn't Quite Draw

Anthropic announced today that starting June 15, every Max 5x subscription will come with a separate $100/month credit earmarked for the Claude Agent SDK and the headless claude -p print mode. It is bundled on top of the existing $100 sub — same price, two buckets. The marketing copy frames it as a thank-you to developers.

I read the announcement this morning, did the math against my last month of usage, sat with it for a few hours, and watched the developer reaction roll in. Two things are true at once and they need to be held at the same time:

For developers whose interactive work happens inside the official claude CLI — me, mostly, and anyone whose primary mode is plugins, skills, subagents, and /-commands — this is a free $100 of cached Sonnet 4.6 on top of a sub they were already paying for. Roughly 1,500–2,500 agent turns of extra headroom. Genuinely good news.
For developers whose interactive work happens through Zed, Conductor, OpenClaw, local wrappers, or claude -p pipelines they actually drive by hand — and there are a lot of them — this is a hard rate-limit cut. The same coding session that yesterday lived on the flat-rate sub now meters against the new $100 credit, then jumps to API rates. Effective limits on that kind of usage drop something like 5–20× once you blow through the credit, which is not hard to do. A non-trivial number of those folks have already said, out loud, they're going to use more Codex for that tier of work.

Both of those things are real. The middle of this post is the cost math that determines which side of the line you're on; the back half is why I think the line Anthropic actually drew is narrower than the principle they appealed to, and what that means for any of us — plugin author, Zed user, Conductor power user — trying to figure out where our daily workflow now bills.

I write Claude Code plugins. A lot of them. Skills, slash commands, subagents, hook scripts that shell out, the occasional standalone Agent SDK script for things that need to run without me at the keyboard. So this announcement is not abstract — it is a re-shaping of the bill I have been quietly running up. Here is what I worked out.

The number I actually care about: turns per $100

You can read the Sonnet 4.6 pricing page and it tells you input is $3.00 per million tokens and output is $15.00. Those are the numbers everyone quotes when they talk about the cost of running an agent. They are also the numbers that, in practice, almost nobody pays — because the line that matters is the next one down:

Cache read: $0.30 per million tokens.

That is a flat 10× discount on any input that matches a prior request's prefix. Once you internalize that, the rest of the math falls out. A typical agent turn looks like this:

~80K tokens of prior conversation, system prompt, and tool definitions (cached on every turn after the first)
~2K new input
~2K output

Run that through the price sheet and you get ~$0.06 per turn with caching, ~$0.27 per turn without. The $100 credit, in turn-units:

Setup	Effective cost / turn	Turns per $100
Agent SDK, caching on by default	~$0.04–0.06	~1,700–2,500
Claude Code on an API key	~$0.06	~1,600
Third-party app, proxy strips cache headers	~$0.27	~370

Visualized:

API turns purchased by $100 of Sonnet 4.6 — Agent SDK ~2,100, Claude Code on API key ~1,600, third-party proxy with no caching ~370

The 4–5× gap between the first two rows and the last row is not about model price. It is about whether your client forwards cache_control properly. Once you've watched a usage block come back with cache_read_input_tokens: 0 after you swore you'd set everything up right, you stop trusting that the cheap path is automatic.

The composition of a single cached turn explains why caching swings the math so hard:

Pie chart of cost composition for one cached agent turn — 50 percent output tokens, 40 percent cache read, 10 percent new input — totaling roughly six cents

The 80K of context — the thing that should dominate the bill — costs less than the 2K of output. Without caching, that same 80K would cost $0.24 by itself. That is the entire shape of the agent cost curve in one number.

The cache_control trap I keep walking into

I have rewritten the same caching mistake three times in three different projects, so I am going to name it here for myself as much as for anyone else.

cache_control is prefix-based. Anthropic hashes your request from the start — tools → system → messages — up to each cache breakpoint, and looks for a byte-identical match in the cache. If it finds one, you pay $0.30/MTok for that prefix. If it doesn't, you pay $3.00/MTok for the whole thing again.

The "byte-identical" is doing a lot of work in that sentence. Things that have silently broken the cache for me:

Injecting Current date: 2026-05-14T21:37:00+05:30 into the system prompt. Felt useful. Defeated caching on every single turn.
Conditionally including a tool definition based on conversation state. Reordering of the tool array, even with the same content, also kills it.
A proxy I was using that reserialized JSON with a different key order between the request and the cached copy.
An "OpenAI-compatible" shim that translated to Anthropic format and dropped cache_control entirely.

The way I verify now is the only way I trust: look at the usage block on the response.

"usage": {
  "input_tokens": 312,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 78422,
  "output_tokens": 540
}

If cache_read_input_tokens is at zero across multiple turns, something in the request transformation pipeline is broken regardless of how the code reads. I have a small helper that logs the cache hit rate after every call in development. I should have written it years earlier.

Practical breakpoint layout that has held up for me:

[ tools: 8K tokens ]              ← cache_control here (changes rarely)
[ system prompt: 4K tokens ]      ← cache_control here (changes rarely)
[ messages[0..n-1]: 70K tokens ]  ← cache_control here (the stable conversation prefix)
[ messages[n]: 2K — new user turn, no cache_control ]

Four breakpoints is the cap. Save one for documents you inject mid-conversation (RAG context, file contents). Don't waste one on a 200-token trailing message.

The plugin question, which is the one I actually live in

The other thing I had to settle before I could relax about the new credit: do my plugins burn the SDK credit or the sub cap?

Answer: sub cap. Always. None of the things I write as a Claude Code plugin author touch the SDK credit, even when they look "agentic" enough that you'd think they should.

The rule Anthropic is drawing is not "is automation involved." It is which process makes the API call. And every plugin I write — skills, slash commands, subagents invoked via the Agent tool, hook scripts that don't shell out — runs inside the same claude process the human launched. Same auth token, same seat, same line on the bill.

flowchart TD
    START(["🟠 Any Claude API call"]) --> Q1{"Human pressed Enter<br/>and is watching output?"}

    Q1 -->|YES| Q2{"Which client<br/>is making the call?"}
    Q1 -->|NO| Q3{"Which client<br/>is making the call?"}

    Q2 -->|"Interactive claude CLI<br/>plugins · skills · subagents · hooks"| SUB["💺 Sub cap<br/>$100 interactive bucket"]
    Q2 -->|"Custom code with API key"| API["💳 Metered API<br/>Pay-as-you-go"]

    Q3 -->|"claude -p headless<br/>or code importing the Agent SDK"| SDK["🤖 SDK credit<br/>New $100 bucket → overage"]
    Q3 -->|"Cron · server agent · 3rd-party app<br/>calling /v1/messages directly"| API

    classDef startNode fill:#e6d5b8,stroke:#b8703a,color:#2a1a10,stroke-width:1.5px
    classDef qNode fill:#f4e6cf,stroke:#a07840,color:#2a1a10,stroke-width:1.5px
    classDef subNode fill:#f4d9b8,stroke:#b8703a,color:#3a2410,stroke-width:1.5px
    classDef sdkNode fill:#d8c4e4,stroke:#5a3a7a,color:#2a1a3a,stroke-width:1.5px
    classDef apiNode fill:#c4d8e4,stroke:#3a5a7a,color:#1a2a3a,stroke-width:1.5px
    class START startNode
    class Q1,Q2,Q3 qNode
    class SUB subNode
    class SDK sdkNode
    class API apiNode

A few edge cases I had to think through for my own plugins:

Subagents launched via the Agent tool. Same process. Sub cap.
Hook scripts that just process tool input/output. Sub cap — they don't make their own API calls; they're stdin/stdout shims.
Hook script that shells out to claude -p for something. That's the SDK credit. The moment the call leaves the interactive process, the line crosses.
A skill that npm installs and imports @anthropic-ai/claude-agent-sdk to do something independent. SDK credit. You've left the interactive auth path.
/loop, /schedule, scheduled remote agents. Headless by definition. SDK credit.

The clean rule of thumb I am keeping in my head: if I pressed Enter and I'm watching the output, it's sub-cap. If it runs because cron fired or a hook fired or my server started a job, it's SDK credit.

This is genuinely good news for anyone who, like me, writes plugins as their primary mode of working. The plugin is one of the cheapest things you can do on a Max 5x seat — you're using the cap you've already paid for, multiplied by whatever leverage the plugin gives you. I have plugins that do half a day of work in a single conversation turn against the same $100 I was paying when all I did was ask Claude questions in a chat window.

The asymmetry, which I'll come back to: this is only unambiguously good news if your interactive coding happens inside claude. The further you have wandered from the official CLI surface — Zed, claude -p as your daily driver, Conductor, your own wrapper — the more this picture changes.

What Anthropic was actually fixing

The reason the split exists at all is more interesting than "we want to charge developers more."

Before June 15, the rules around SDK and headless usage on a Max sub were ambiguous, and that ambiguity was being exploited at the long tail in two specific ways:

24/7 autonomous loops on a sub seat. Someone pointing the Agent SDK at a $100 seat and running headless work around the clock could burn thousands of dollars of API-equivalent compute. Anthropic ate the delta.
Reselling sub capacity. Third-party SaaS products calling Claude through a customer's Claude Code session — effectively reselling Anthropic compute at a markup neither Anthropic nor the customer's sub plan agreed to.

Both of these break the implicit deal of a flat-rate sub, which is priced on human-bounded usage — you can only type and read so fast. Once a machine is doing the typing, the unit economics collapse.

The clean response would have been to hard-cap SDK calls on the sub or push them to metered API at full freight. Anthropic did neither. They gave the SDK its own $100 bucket — sized to comfortably cover the median developer's monthly headless usage — and offered opt-in overage past it. For most of us that is a free upgrade. For the small number of people running $500–$5,000/month of headless compute on a $100 seat, it is the end of a free lunch, with a clear graduation path to metered API.

The strategic logic, if you back up: Anthropic is doing what every platform eventually does once developer ecosystem gets real. Separate the "human seat" SKU from the "machine compute" SKU. GitHub did it (Copilot vs. Copilot Business). OpenAI did it (ChatGPT Plus vs. API). AWS does it with everything. The $100 credit is the soft version of that move — the bundled, generous version where they signal we want the SDK to grow rather than please stop using the sub for that.

The line they drew is not the line they said they drew

This is the part of the announcement I have been chewing on the longest, because I do not think Anthropic split the bill along the principle they claim to have split it along.

The principle that would make sense is the rule of thumb in the flowchart above: human in the loop = sub cap. Background/unattended = SDK credit. That's the line developers have been asking for, and it tracks reality — a person actively coding, reviewing, and steering a session is not the same animal as a cron-fired GitHub Actions job churning through tickets at 3am.

The line Anthropic actually drew is narrower than that. It is not "is a human steering this," it is "is this happening inside the official claude CLI process." Those are different lines, and the gap between them contains a lot of normal developer life:

claude -p used as an interactive coding companion. A piped one-shot is the most natural way to wire Claude into a script you are running by hand — it lands on the SDK credit even though there is a human at the keyboard.
Editor integrations. Zed's Claude integration. JetBrains plugins. Whatever your editor does. Same developer, same session, same intent — different process. SDK credit.
Local wrappers and orchestrators. Conductor, OpenClaw, the small bash-and-jq wrappers I have written for my own workflows. The human is more engaged in these than they are in many Claude Code sessions. They still bill against the headless bucket.

So the actual question is not "am I paying for an interactive coding assistant or a background bot." It is "am I paying for an interactive coding assistant, or am I paying for one specific interface to an interactive coding assistant." Right now the answer leans toward the second, and that is going to chafe.

The mechanical consequence for anyone running async or AFK-style work — Conductor sessions, claude -p in shell pipelines, scheduled jobs that used to feel like "more Claude Code" — is that the effective rate limit on that usage drops by something like 5–20×, because once the new $100 credit is spent, the meter is API rates, not the flat-rate elasticity of the sub. For people whose Claude Code usage is mostly background jobs and Conductor sessions, that is a real cut even though the marketing copy says "extra credit." A non-trivial number of those folks have already said, out loud, that they are going to use more Codex for that tier of work. They are not wrong to consider it.

The pattern this kicks off is the one that always shows up when pricing logic doesn't match workflow logic: developers route around the line. Expect to see tmux pane automation, fake-interactivity shims, passthrough modes, and various creative reframings of "this is technically a claude interactive session." That is not a desirable outcome for anyone, including Anthropic.

The fairer version of this split — the one I hope the next iteration looks like — keys the bucket on whether a human is actively in the loop, not on which binary made the syscall. If I am driving the session in Zed or piping prompts into claude -p while debugging at the terminal, that should behave like sub usage. If the same code path is being run by cron with no human attached, that should be SDK credit. The technical signal for "human in the loop" is not trivial (Anthropic would need some combination of session heuristics, idle timeouts, and possibly an explicit interactive flag) but it is closer to the principle the announcement actually appealed to.

None of this changes that the headless infrastructure use case — GitHub Actions, server-side agents, true unattended automation — was being subsidized in a way that was never going to last. That part of the move is fine. The complaint is narrower: don't lump editor integrations and local wrappers into the same penalty bucket as a bot farm. They are not the same thing, and treating them the same makes the product feel worse for exactly the developers who have been most invested in the platform.

I am still glad the ambiguity is gone. I would have preferred the clarification be drawn along the principle Anthropic gestured at, rather than the surface-area shortcut they actually shipped.

What I am actually going to do differently

Practical changes I've made to my own setup since reading the announcement:

My /loop and /schedule cron stuff is now running on the SDK credit by default. I had not been thinking about which bucket those landed in. Now I check.
Anything I write that uses @anthropic-ai/claude-agent-sdk directly gets a comment at the top of the file noting which bucket it bills against, so I don't accidentally find out three weeks later when an overage charge shows up.
I'm doubling down on plugins. They are the cheapest leverage in this pricing model. Every workflow I can move from "drive Claude Code manually for 20 minutes" to "invoke a plugin that runs the same workflow in one turn" is a direct ROI on the sub I am already paying for.
I have a usage-block logger in every Agent SDK script I write. Cache hit rate visible from the first run. If cache_read_input_tokens is at zero after the second turn, something is wrong upstream and I want to know now, not at the end of the month.
For the grey-zone workflows — claude -p pipelines, Zed sessions, Conductor — I'm measuring before I decide. I want to know in real numbers how much of my actual interactive work just got reclassified to the SDK credit, so I can either push it back inside claude or, where it doesn't fit, weigh whether Codex on those specific tasks is genuinely cheaper for me right now.

The headline number is $100. The number that actually matters is the fraction of your input tokens that are getting cached and whether your normal workflow lands in the bucket the announcement implied it would. For developers whose Claude Code usage is mostly inside claude and its plugin surface, June 15 is a free budget upgrade. For developers who live in Zed, in claude -p pipelines, or in Conductor, it is a real cut dressed up as a credit. Both of those things are true at the same time, and the split between them is roughly where the conversation is going to keep happening.

The Line Anthropic Didn't Quite Draw

See Also

Turn Claude CLI Into Your Personal Development Assistant

When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

Dynamic Workflows: The Confused Deputy Behind Every agent()

The Line Anthropic Didn't Quite Draw

The number I actually care about: turns per $100

The cache_control trap I keep walking into

The plugin question, which is the one I actually live in

What Anthropic was actually fixing

The line they drew is not the line they said they drew

What I am actually going to do differently

About Boni Gopalan

See Also

Turn Claude CLI Into Your Personal Development Assistant

When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

Dynamic Workflows: The Confused Deputy Behind Every agent()

The Line Anthropic Didn't Quite Draw

The number I actually care about: turns per $100

The cache_control trap I keep walking into

The plugin question, which is the one I actually live in

What Anthropic was actually fixing

The line they drew is not the line they said they drew

What I am actually going to do differently

More Articles

Turn Claude CLI Into Your Personal Development Assistant

When Bill Gates Said AI Can't Write "Real" Software, I Decided to Test Him

Dynamic Workflows: The Confused Deputy Behind Every agent()

About Boni Gopalan