Home › Blog › Who Owns the Context Window? - Series Overview

Who Owns the Context Window?

Part 0 of 7
  1. Part 1 Part 1 Title
  2. Part 2 Part 2 Title
  3. Part 3 Part 3 Title
  4. Part 4 Part 4 Title
  5. Part 5 Part 5 Title
  6. Part 6 Part 6 Title
  7. Part 7 Part 7 Title
Boni Gopalan June 5, 2026 6 min read AI

Who Owns the Context Window? - Series Overview

AIClaude CodeAgentsContext EngineeringDeveloper ToolsSDLC
Who Owns the Context Window? - Series Overview

See Also

ℹ️
Article

Dynamic Workflows: The Confused Deputy Behind Every agent()

If every side effect in a dynamic workflow is routed through an agent(), then the security of the workflow is the security of agent() — a subagent that can do anything your session permits. A follow-up that probes that surface empirically: the JS lockdown isn't a boundary, the engine adds control-model risk rather than new privilege, and two probabilistic guardrails — model injection resistance and the harness action classifier — both fired but neither is a wall.

AI
Article

Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Claude Code's Workflow tool runs a deterministic JavaScript controller that spawns and awaits LLM subagents. This is the execution model and the sandbox contract — what the script can and can't do, verified by probing the isolate, not just reading the docs: no require, no process, no fetch, a dual-layer determinism guard, and every side effect routed through an agent().

AI
Article

Pi-Ralph Is Smaller Than It Looks and Bigger Than It Acts

Entelligentsia's pi-ralph extension fits a Generator-Critique-Judge agent loop in ~900 lines. A balanced look at what the pi harness hands you for free, what the scaffold latently affords — multi-model role assignment and dynamic tool synthesis via bash — and what production users still have to build.

AI

Who Owns the Context Window?

Every AI coding agent has one scarce resource that matters more than its model, its tools, or its prompt: the context window — the decision about what the model actually sees, each turn, in each phase of its work. Tokens are the line item on the invoice. The window is who decides what becomes a token at all.

For the past year I've been forced to answer that question the practical way, by building the same system twice. Forge, the SDLC engine I build at Entelligentsia, runs as a Claude Code plugin and as forge-cli, a standalone harness on the pi coding agent. Along the way I adopted context middleware, wrote a compression library, built a full context governor — and then commissioned a deep research pass that told me the platforms were absorbing the entire layer I'd been building.

This series is that journey, told in order, with the research woven in where the journey earned it. It's written for developers building with agents and for the managers paying for them. No part requires the others, but they compound — the way the problem did.

The seven parts

Part 1 — Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness Forge lives in two bodies. The plugin's orchestrators were prose — markdown an LLM executed — until Claude Code GA'd Dynamic Workflows last week and the port was so natural it felt like the platform shipping the part I'd been simulating. The second body, forge-cli, was code from day one. The two are converging on everything except one row of the comparison table: what my agents see, each turn, in each phase. That row is the series.

Part 2 — The Token Bill Arrives: Discovering lean-ctx A workflow engine reads store queries, entity JSON, and validation reports every turn, and the bill and the context rot arrived together. My first instinct was everyone's in 2025: reach for middleware — lean-ctx, headroom, rtk, a real market built on a measurable problem. It carried me a long way. But generic compression doesn't know that an architect planning and an engineer reviewing need different things to survive in context.

Part 3 — forge-compress: When Boring Heuristics Beat Clever Architecture I built a compression library expecting the wins to come from the clever parts — entropy-driven compression, tiered budget allocation. The eval data pointed somewhere more humbling: clamping bash output and flattening JSON responses bought more than anything dramatically smarter would have. In hindsight, the design notes read like premonitions of what the market would select for: stable shapes that respect prompt caches, no model dependency, lossy by intent.

Part 4 — The Context Governor: Policy, Not Compression Compression asks how to make output smaller. Governance asks what this agent, in this phase, deserves to keep. Five mechanisms wired into forge-cli's tool pipeline — dedup, schema-trim, span-clamp, a live budget meter, checkpoint-and-shed — with policies keyed by persona and phase. Also the embarrassing part: the governor shipped dormant, and I'll tell you exactly how. A series that demands honest accounting from vendors has to start with mine.

Part 5 — Then I Asked Whether Any of This Should Exist Having built the layer three times, I stopped building and ran the research: a 99-agent deep-research workflow, 78 extracted claims, 25 adversarially verified. The verdict was uncomfortable. Anthropic has absorbed nearly the entire middleware surface into server-side primitives; OpenAI commoditized compaction into an endpoint; Google made the question invisible. I wasn't reading a market survey — I was triaging my own roadmap.

Part 6 — Who Audits the Meter? This one began as my own accusation: you charge me for every token, compress before it hits your infrastructure, and pocket the difference. The charge failed on the facts — billing is post-edit, and every API response carries the proof. But it survived in sharper form: the per-request savings data evaporates before it reaches any dashboard. Cache savings get a first-class billing line; editing savings, you take on faith. Trust-me metering, and the question only a governor-builder knows to ask.

Part 7 — Where I Land: The Governor Belongs in the Harness The conviction the journey produced. Middleware is transitionary — the platforms ship your roadmap. Platform absorption is real but arrives below the line, as opacity. The durable home for context management is the harness layer: open, pluggable, phase-aware, cache-respecting, audit-trailed. And a watchlist: when the opt-in betas become defaults, and whether anyone builds the cross-provider token accounting no vendor has an incentive to.

Why you might care

If you build with agents: this is a field guide to a layer of your stack that is moving under your feet, written by someone who built on it before and after it moved.

If you manage teams that build with agents: Parts 2, 4, and 6 are about money — where token spend actually goes, what governing it looks like, and why no vendor dashboard will show you the number you most need.

If you build developer tools: Part 5 is the uncomfortable one. Read it before your next roadmap review.


Forge is open source: the Claude Code plugin and forge-cli, built on the pi coding agent. New parts publish weekly.

More Articles

Dynamic Workflows: The Confused Deputy Behind Every agent()

Dynamic Workflows: The Confused Deputy Behind Every agent()

If every side effect in a dynamic workflow is routed through an agent(), then the security of the workflow is the security of agent() — a subagent that can do anything your session permits. A follow-up that probes that surface empirically: the JS lockdown isn't a boundary, the engine adds control-model risk rather than new privilege, and two probabilistic guardrails — model injection resistance and the harness action classifier — both fired but neither is a wall.

Boni Gopalan 11 min read
Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Claude Code's Workflow tool runs a deterministic JavaScript controller that spawns and awaits LLM subagents. This is the execution model and the sandbox contract — what the script can and can't do, verified by probing the isolate, not just reading the docs: no require, no process, no fetch, a dual-layer determinism guard, and every side effect routed through an agent().

Boni Gopalan 13 min read
Pi-Ralph Is Smaller Than It Looks and Bigger Than It Acts

Pi-Ralph Is Smaller Than It Looks and Bigger Than It Acts

Entelligentsia's pi-ralph extension fits a Generator-Critique-Judge agent loop in ~900 lines. A balanced look at what the pi harness hands you for free, what the scaffold latently affords — multi-model role assignment and dynamic tool synthesis via bash — and what production users still have to build.

Boni Gopalan 8 min read
Next Part 1 Title

About Boni Gopalan

Elite software architect specializing in AI systems, emotional intelligence, and scalable cloud architectures. Founder of Entelligentsia.

Entelligentsia Entelligentsia