Who Owns the Context Window? - Series Overview

AIClaude CodeAgentsContext EngineeringDeveloper ToolsSDLC

A new model dropped, and I wanted to know if it was actually better for my agentic harness or just newer and pricier. Cost is the easy half; capability is the hard one — and most agent benchmarks fail because the task is too easy to separate the contestants. So I planted a one-token bug in git's version-comparison code, removed the version-control oracle, and had three Claude models find it blind — Opus, Fable, and Sonnet, one in the driver seat at a time. They all passed; precision and the rate card decided the rest. Plus the methodology twist where I almost reported a regression that wasn't there.

Article

Dynamic Workflows: The Confused Deputy Behind Every agent()

If every side effect in a dynamic workflow is routed through an agent(), then the security of the workflow is the security of agent() — a subagent that can do anything your session permits. A follow-up that probes that surface empirically: the JS lockdown isn't a boundary, the engine adds control-model risk rather than new privilege, and two probabilistic guardrails — model injection resistance and the harness action classifier — both fired but neither is a wall.

Article

Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Claude Code's Workflow tool runs a deterministic JavaScript controller that spawns and awaits LLM subagents. This is the execution model and the sandbox contract — what the script can and can't do, verified by probing the isolate, not just reading the docs: no require, no process, no fetch, a dual-layer determinism guard, and every side effect routed through an agent().

Who Owns the Context Window?

Every AI coding agent has one scarce resource that matters more than its model, its tools, or its prompt: the context window — the decision about what the model actually sees, each turn, in each phase of its work. Tokens are the line item on the invoice. The window is who decides what becomes a token at all.

For the past year I've been forced to answer that question the practical way, by building the same system twice. Forge, the SDLC engine I build at Entelligentsia, runs as a Claude Code plugin and as forge-cli, a standalone harness on the pi coding agent. Along the way I adopted context middleware, wrote a compression library, built a full context governor — and then commissioned a deep research pass that told me the platforms were absorbing the entire layer I'd been building.

This series is that journey, told in order, with the research woven in where the journey earned it. It's written for developers building with agents and for the managers paying for them. No part requires the others, but they compound — the way the problem did.

The six parts

Part 1 — Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness Forge lives in two bodies. The plugin's orchestrators were prose — markdown an LLM executed — until Claude Code GA'd Dynamic Workflows last week and the port was so natural it felt like the platform shipping the part I'd been simulating. The second body, forge-cli, was code from day one. The two are converging on everything except one row of the comparison table: what my agents see, each turn, in each phase. That row is the series.

Part 2 — The Token Bill Arrives: Discovering lean-ctx The first time I decomposed a month's token bill, the whale wasn't intelligence — it was transit. My first instinct was everyone's: reach for middleware — lean-ctx, rtk, headroom, a real market built in a matter of months on a measurable problem, each with its scalpel at a different layer. And somewhere between the install and the invoice, three questions started nagging that none of the available meters could answer.

Part 3 — The Three Meters Never Agree: Putting lean-ctx, rtk, and headroom on the Bench Same real task, same harness, same models, one neutral meter — the provider's bill. A pre-registered protocol frozen in git before the runs, right-of-reply issues filed with every maintainer, every transcript public. What the bench said is not what any of the three meters promised — and the differences between the meters became the finding.

Part 4 — Then I Asked Whether Any of This Should Exist Having built the layer three times, I stopped building and ran the research: a 99-agent deep-research workflow, 78 extracted claims, 25 adversarially verified. The verdict was uncomfortable. Anthropic has absorbed nearly the entire middleware surface into server-side primitives; OpenAI commoditized compaction into an endpoint; Google made the question invisible. I wasn't reading a market survey — I was triaging my own roadmap.

Part 5 — Who Audits the Meter? This one began as my own accusation: you charge me for every token, compress before it hits your infrastructure, and pocket the difference. The charge failed on the facts — billing is post-edit, and every API response carries the proof. But it survived in sharper form: the per-request savings data evaporates before it reaches any dashboard. Cache savings get a first-class billing line; editing savings, you take on faith. Trust-me metering, and the question only a governor-builder knows to ask.

Part 6 — Where I Land: The Governor Belongs in the Harness The conviction the journey produced. Middleware is transitionary — the platforms ship your roadmap. Platform absorption is real but arrives below the line, as opacity. The durable home for context management is the harness layer: open, pluggable, phase-aware, cache-respecting, audit-trailed. And a watchlist: when the opt-in betas become defaults, and whether anyone builds the cross-provider token accounting no vendor has an incentive to.

Why you might care

If you build with agents: this is a field guide to a layer of your stack that is moving under your feet, written by someone who built on it before and after it moved.

If you manage teams that build with agents: Parts 2, 3, and 5 are about money — where token spend actually goes, what an independent benchmark says about saving it, and why no vendor dashboard will show you the number you most need.

If you build developer tools: Parts 3 and 4 are the uncomfortable ones. Read them before your next roadmap review.

Forge is open source: the Claude Code plugin and forge-cli, built on the pi coding agent. New parts publish weekly.

About Boni Gopalan

Elite software architect specializing in AI systems, emotional intelligence, and scalable cloud architectures. Founder of Entelligentsia.

Entelligentsia