Anthropic Split the Bill on June 15. Here's What It Actually Means If You Live Inside Claude Code.
Anthropic announced today that starting June 15, every Max 5x subscription will come with a separate $100/month credit earmarked for the Claude Agent SDK and the headless claude -p print mode. It is bundled on top of the existing $100 sub — same price, two buckets.
I read the announcement this morning, did the math against my last month of usage, and then sat with it for a few hours. The headline is generous. The interesting part is what Anthropic is quietly drawing a line around, and whether anything I do every day crosses it.
I write Claude Code plugins. A lot of them. Skills, slash commands, subagents, hook scripts that shell out, the occasional standalone Agent SDK script for things that need to run without me at the keyboard. So this announcement is not abstract — it is a re-shaping of the bill I have been quietly running up.
Here is what I worked out.
The number I actually care about: turns per $100
You can read the Sonnet 4.6 pricing page and it tells you input is $3.00 per million tokens and output is $15.00. Those are the numbers everyone quotes when they talk about the cost of running an agent. They are also the numbers that, in practice, almost nobody pays — because the line that matters is the next one down:
Cache read: $0.30 per million tokens.
That is a flat 10× discount on any input that matches a prior request's prefix. Once you internalize that, the rest of the math falls out. A typical agent turn looks like this:
- ~80K tokens of prior conversation, system prompt, and tool definitions (cached on every turn after the first)
- ~2K new input
- ~2K output
Run that through the price sheet and you get ~$0.06 per turn with caching, ~$0.27 per turn without. The $100 credit, in turn-units:
| Setup | Effective cost / turn | Turns per $100 |
|---|---|---|
| Agent SDK, caching on by default | ~$0.04–0.06 | ~1,700–2,500 |
| Claude Code on an API key | ~$0.06 | ~1,600 |
| Third-party app, proxy strips cache headers | ~$0.27 | ~370 |
Visualized:
The 4–5× gap between the first two rows and the last row is not about model price. It is about whether your client forwards cache_control properly. Once you've watched a usage block come back with cache_read_input_tokens: 0 after you swore you'd set everything up right, you stop trusting that the cheap path is automatic.
The composition of a single cached turn explains why caching swings the math so hard:
The 80K of context — the thing that should dominate the bill — costs less than the 2K of output. Without caching, that same 80K would cost $0.24 by itself. That is the entire shape of the agent cost curve in one number.
The cache_control trap I keep walking into
I have rewritten the same caching mistake three times in three different projects, so I am going to name it here for myself as much as for anyone else.
cache_control is prefix-based. Anthropic hashes your request from the start — tools → system → messages — up to each cache breakpoint, and looks for a byte-identical match in the cache. If it finds one, you pay $0.30/MTok for that prefix. If it doesn't, you pay $3.00/MTok for the whole thing again.
The "byte-identical" is doing a lot of work in that sentence. Things that have silently broken the cache for me:
- Injecting
Current date: 2026-05-14T21:37:00+05:30into the system prompt. Felt useful. Defeated caching on every single turn. - Conditionally including a tool definition based on conversation state. Reordering of the tool array, even with the same content, also kills it.
- A proxy I was using that reserialized JSON with a different key order between the request and the cached copy.
- An "OpenAI-compatible" shim that translated to Anthropic format and dropped
cache_controlentirely.
The way I verify now is the only way I trust: look at the usage block on the response.
"usage": {
"input_tokens": 312,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 78422,
"output_tokens": 540
}
If cache_read_input_tokens is at zero across multiple turns, something in the request transformation pipeline is broken regardless of how the code reads. I have a small helper that logs the cache hit rate after every call in development. I should have written it years earlier.
Practical breakpoint layout that has held up for me:
[ tools: 8K tokens ] ← cache_control here (changes rarely)
[ system prompt: 4K tokens ] ← cache_control here (changes rarely)
[ messages[0..n-1]: 70K tokens ] ← cache_control here (the stable conversation prefix)
[ messages[n]: 2K — new user turn, no cache_control ]
Four breakpoints is the cap. Save one for documents you inject mid-conversation (RAG context, file contents). Don't waste one on a 200-token trailing message.
The plugin question, which is the one I actually live in
The other thing I had to settle before I could relax about the new credit: do my plugins burn the SDK credit or the sub cap?
Answer: sub cap. Always. None of the things I write as a Claude Code plugin author touch the SDK credit, even when they look "agentic" enough that you'd think they should.
The rule Anthropic is drawing is not "is automation involved." It is which process makes the API call. And every plugin I write — skills, slash commands, subagents invoked via the Agent tool, hook scripts that don't shell out — runs inside the same claude process the human launched. Same auth token, same seat, same line on the bill.
flowchart TD
START(["🟠 Any Claude API call"]) --> Q1{"Human pressed Enter<br/>and is watching output?"}
Q1 -->|YES| Q2{"Which client<br/>is making the call?"}
Q1 -->|NO| Q3{"Which client<br/>is making the call?"}
Q2 -->|"Interactive claude CLI<br/>plugins · skills · subagents · hooks"| SUB["💺 Sub cap<br/>$100 interactive bucket"]
Q2 -->|"Custom code with API key"| API["💳 Metered API<br/>Pay-as-you-go"]
Q3 -->|"claude -p headless<br/>or code importing the Agent SDK"| SDK["🤖 SDK credit<br/>New $100 bucket → overage"]
Q3 -->|"Cron · server agent · 3rd-party app<br/>calling /v1/messages directly"| API
classDef startNode fill:#e6d5b8,stroke:#b8703a,color:#2a1a10,stroke-width:1.5px
classDef qNode fill:#f4e6cf,stroke:#a07840,color:#2a1a10,stroke-width:1.5px
classDef subNode fill:#f4d9b8,stroke:#b8703a,color:#3a2410,stroke-width:1.5px
classDef sdkNode fill:#d8c4e4,stroke:#5a3a7a,color:#2a1a3a,stroke-width:1.5px
classDef apiNode fill:#c4d8e4,stroke:#3a5a7a,color:#1a2a3a,stroke-width:1.5px
class START startNode
class Q1,Q2,Q3 qNode
class SUB subNode
class SDK sdkNode
class API apiNode
A few edge cases I had to think through for my own plugins:
- Subagents launched via the
Agenttool. Same process. Sub cap. - Hook scripts that just process tool input/output. Sub cap — they don't make their own API calls; they're stdin/stdout shims.
- Hook script that shells out to
claude -pfor something. That's the SDK credit. The moment the call leaves the interactive process, the line crosses. - A skill that
npm installs and imports@anthropic-ai/claude-agent-sdkto do something independent. SDK credit. You've left the interactive auth path. /loop,/schedule, scheduled remote agents. Headless by definition. SDK credit.
The clean rule of thumb I am keeping in my head: if I pressed Enter and I'm watching the output, it's sub-cap. If it runs because cron fired or a hook fired or my server started a job, it's SDK credit.
This is genuinely good news for anyone writing plugins as their primary mode of working. The plugin is one of the cheapest things you can do on a Max 5x seat — you're using the cap you've already paid for, multiplied by whatever leverage the plugin gives you. I have plugins that do half a day of work in a single conversation turn against the same $100 I was paying when all I did was ask Claude questions in a chat window.
What Anthropic was actually fixing
The reason the split exists at all is more interesting than "we want to charge developers more."
Before June 15, the rules around SDK and headless usage on a Max sub were ambiguous, and that ambiguity was being exploited at the long tail in two specific ways:
- 24/7 autonomous loops on a sub seat. Someone pointing the Agent SDK at a $100 seat and running headless work around the clock could burn thousands of dollars of API-equivalent compute. Anthropic ate the delta.
- Reselling sub capacity. Third-party SaaS products calling Claude through a customer's Claude Code session — effectively reselling Anthropic compute at a markup neither Anthropic nor the customer's sub plan agreed to.
Both of these break the implicit deal of a flat-rate sub, which is priced on human-bounded usage — you can only type and read so fast. Once a machine is doing the typing, the unit economics collapse.
The clean response would have been to hard-cap SDK calls on the sub or push them to metered API at full freight. Anthropic did neither. They gave the SDK its own $100 bucket — sized to comfortably cover the median developer's monthly headless usage — and offered opt-in overage past it. For most of us that is a free upgrade. For the small number of people running $500–$5,000/month of headless compute on a $100 seat, it is the end of a free lunch, with a clear graduation path to metered API.
The strategic logic, if you back up: Anthropic is doing what every platform eventually does once developer ecosystem gets real. Separate the "human seat" SKU from the "machine compute" SKU. GitHub did it (Copilot vs. Copilot Business). OpenAI did it (ChatGPT Plus vs. API). AWS does it with everything. The $100 credit is the soft version of that move — the bundled, generous version where they signal we want the SDK to grow rather than please stop using the sub for that.
What I am actually going to do differently
Practical changes I've made to my own setup since reading the announcement:
- My
/loopand/schedulecron stuff is now running on the SDK credit by default. I had not been thinking about which bucket those landed in. Now I check. - Anything I write that uses
@anthropic-ai/claude-agent-sdkdirectly gets a comment at the top of the file noting which bucket it bills against, so I don't accidentally find out three weeks later when an overage charge shows up. - I'm doubling down on plugins. They are the cheapest leverage in this pricing model. Every workflow I can move from "drive Claude Code manually for 20 minutes" to "invoke a plugin that runs the same workflow in one turn" is a direct ROI on the sub I am already paying for.
- I have a
usage-block logger in every Agent SDK script I write. Cache hit rate visible from the first run. Ifcache_read_input_tokensis at zero after the second turn, something is wrong upstream and I want to know now, not at the end of the month.
The headline number is $100. The number that actually matters is the fraction of your input tokens that are getting cached. Anthropic just gave most of us more budget for the headless work we were probably already doing. Whether it goes far enough depends almost entirely on whether your client knows how to ask for the discount.