Who Owns the Context Window?
Every AI coding agent has one scarce resource that matters more than its model, its tools, or its prompt: the context window — the decision about what the model actually sees, each turn, in each phase of its work. Tokens are the line item on the invoice. The window is who decides what becomes a token at all.
For the past year I've been forced to answer that question the practical way, by building the same system twice. Forge, the SDLC engine I build at Entelligentsia, runs as a Claude Code plugin and as forge-cli, a standalone harness on the pi coding agent. Along the way I adopted context middleware, wrote a compression library, built a full context governor — and then commissioned a deep research pass that told me the platforms were absorbing the entire layer I'd been building.
This series is that journey, told in order, with the research woven in where the journey earned it. It's written for developers building with agents and for the managers paying for them. No part requires the others, but they compound — the way the problem did.
The seven parts
Part 1 — Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness
Forge lives in two bodies. The plugin's orchestrators were prose — markdown an LLM executed — until Claude Code GA'd Dynamic Workflows last week and the port was so natural it felt like the platform shipping the part I'd been simulating. The second body, forge-cli, was code from day one. The two are converging on everything except one row of the comparison table: what my agents see, each turn, in each phase. That row is the series.
Part 2 — The Token Bill Arrives: Discovering lean-ctx
A workflow engine reads store queries, entity JSON, and validation reports every turn, and the bill and the context rot arrived together. My first instinct was everyone's in 2025: reach for middleware — lean-ctx, headroom, rtk, a real market built on a measurable problem. It carried me a long way. But generic compression doesn't know that an architect planning and an engineer reviewing need different things to survive in context.
Part 3 — forge-compress: When Boring Heuristics Beat Clever Architecture
I built a compression library expecting the wins to come from the clever parts — entropy-driven compression, tiered budget allocation. The eval data pointed somewhere more humbling: clamping bash output and flattening JSON responses bought more than anything dramatically smarter would have. In hindsight, the design notes read like premonitions of what the market would select for: stable shapes that respect prompt caches, no model dependency, lossy by intent.
Part 4 — The Context Governor: Policy, Not Compression
Compression asks how to make output smaller. Governance asks what this agent, in this phase, deserves to keep. Five mechanisms wired into forge-cli's tool pipeline — dedup, schema-trim, span-clamp, a live budget meter, checkpoint-and-shed — with policies keyed by persona and phase. Also the embarrassing part: the governor shipped dormant, and I'll tell you exactly how. A series that demands honest accounting from vendors has to start with mine.
Part 5 — Then I Asked Whether Any of This Should Exist
Having built the layer three times, I stopped building and ran the research: a 99-agent deep-research workflow, 78 extracted claims, 25 adversarially verified. The verdict was uncomfortable. Anthropic has absorbed nearly the entire middleware surface into server-side primitives; OpenAI commoditized compaction into an endpoint; Google made the question invisible. I wasn't reading a market survey — I was triaging my own roadmap.
Part 6 — Who Audits the Meter?
This one began as my own accusation: you charge me for every token, compress before it hits your infrastructure, and pocket the difference. The charge failed on the facts — billing is post-edit, and every API response carries the proof. But it survived in sharper form: the per-request savings data evaporates before it reaches any dashboard. Cache savings get a first-class billing line; editing savings, you take on faith. Trust-me metering, and the question only a governor-builder knows to ask.
Part 7 — Where I Land: The Governor Belongs in the Harness
The conviction the journey produced. Middleware is transitionary — the platforms ship your roadmap. Platform absorption is real but arrives below the line, as opacity. The durable home for context management is the harness layer: open, pluggable, phase-aware, cache-respecting, audit-trailed. And a watchlist: when the opt-in betas become defaults, and whether anyone builds the cross-provider token accounting no vendor has an incentive to.
Why you might care
If you build with agents: this is a field guide to a layer of your stack that is moving under your feet, written by someone who built on it before and after it moved.
If you manage teams that build with agents: Parts 2, 4, and 6 are about money — where token spend actually goes, what governing it looks like, and why no vendor dashboard will show you the number you most need.
If you build developer tools: Part 5 is the uncomfortable one. Read it before your next roadmap review.
Forge is open source: the Claude Code plugin and forge-cli, built on the pi coding agent. New parts publish weekly.