Home › Blog › Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness

Who Owns the Context Window?

Part 1 of 7
  1. Part 1 Part 1 Title
  2. Part 2 Part 2 Title
  3. Part 3 Part 3 Title
  4. Part 4 Part 4 Title
  5. Part 5 Part 5 Title
  6. Part 6 Part 6 Title
  7. Part 7 Part 7 Title
Boni Gopalan June 5, 2026 12 min read AI

Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness

AIClaude CodeAgentsContext EngineeringDeveloper ToolsSDLC
Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness

See Also

ℹ️
Article

Dynamic Workflows: The Confused Deputy Behind Every agent()

If every side effect in a dynamic workflow is routed through an agent(), then the security of the workflow is the security of agent() — a subagent that can do anything your session permits. A follow-up that probes that surface empirically: the JS lockdown isn't a boundary, the engine adds control-model risk rather than new privilege, and two probabilistic guardrails — model injection resistance and the harness action classifier — both fired but neither is a wall.

AI
Article

Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Claude Code's Workflow tool runs a deterministic JavaScript controller that spawns and awaits LLM subagents. This is the execution model and the sandbox contract — what the script can and can't do, verified by probing the isolate, not just reading the docs: no require, no process, no fetch, a dual-layer determinism guard, and every side effect routed through an agent().

AI
Article

Pi-Ralph Is Smaller Than It Looks and Bigger Than It Acts

Entelligentsia's pi-ralph extension fits a Generator-Critique-Judge agent loop in ~900 lines. A balanced look at what the pi harness hands you for free, what the scaffold latently affords — multi-model role assignment and dynamic tool synthesis via bash — and what production users still have to build.

AI

Same Brain, Two Bodies: Forge as a Plugin, Forge as a Harness

Last week I deleted three of the most carefully written documents of my engineering life, and the overwhelming feeling was relief.

They were orchestrators. Not orchestrators in the Kubernetes sense — orchestrators in the "markdown files that tell an LLM how to run a software development pipeline" sense. For most of a year, the beating heart of Forge, the SDLC engine I build at Entelligentsia, was prose. Paragraphs that said things like: if the review verdict is REJECTED and the iteration count is below three, return to the implement phase. A language model read those paragraphs and, mostly, did what they said.

Then Claude Code shipped Dynamic Workflows to general availability, and within days the prose was gone — replaced by three JavaScript files that do the same job deterministically. The port was so natural it barely felt like migration. It felt like the platform had finally shipped the part I'd been simulating by hand all along.

I want to hold onto that feeling, because this series is going to complicate it. Being absorbed by your platform and being grateful for it — that happened to me last week, and it was genuinely good. But over the past year I've also watched the platforms absorb a different layer of my stack, one where the absorption doesn't hand me a hook to stand on. It hands me a curtain.

This is the first of seven pieces about a question I didn't plan to spend a year on: who owns the context window? Not the model. Not the tokens. The window — the decision about what an AI agent actually sees, each turn, in each phase of its work. I got to that question the slow way, by building the same system twice, and the architecture of those two builds is the best introduction to everything that follows. So let me show you both bodies.

What Forge does

Forge is an SDLC engine for AI agents. You point it at a codebase, run init, and it reads your stack and generates three things: a structured knowledge base specific to your project, a set of agent personas (engineer, supervisor, QA, architect), and a gated pipeline that drives real software tasks end to end:

Plan → Review Plan → Implement → Review Code → Validate → Approve → Writeback → Commit

Each phase produces an artifact — a plan, a review with an extractable verdict, a validation report. Review phases can loop a task back for revision, but only up to a cap, and when the cap is exhausted the system escalates to a human. The design principle behind that cap is the closest thing Forge has to a constitution: continuing past an unresolved problem is never acceptable.

Underneath sits a JSON store — sprints, tasks, bugs, features, events — that functions as the single source of truth, plus a writeback phase that feeds what each sprint learned back into the knowledge base. Every sprint sharpens the system that runs the next one.

That's the brain. It has been remarkably stable for a year. What has not been stable is the body it runs in — and Forge has two.

The first body: a guest in Claude Code

Forge began as a Claude Code plugin, and the plugin is still how most people meet it. Architecturally, a Claude Code plugin is a particular kind of creature: your commands are markdown files, your agents are markdown files, and until very recently, your orchestration logic was also markdown — instructions that Claude itself executed.

It's worth pausing on what that actually meant day to day. The orchestrator — the thing deciding which phase runs next, which model handles it, whether a gate passed — was prose interpreted by a language model. Verdict extraction was "read the review and decide." Loop caps were instructions the model usually honored. The bug severity shortcuts (a critical bug skips planning and goes straight to implement-review-commit; a minor one runs the full gauntlet) were paragraphs of conditional logic, executed by something that does not, strictly speaking, execute.

And here's the thing developers who haven't built on this substrate don't always believe: it mostly worked. Frontier models are good at following procedural instructions. But "mostly" is a load-bearing word in an SDLC tool. I have shipped fixes for orchestration edge-case bugs that were, in the most literal sense, prompt bugs — debugged not with a stack trace but by rereading my own paragraphs, looking for the sentence a model could plausibly misread. If you've ever debugged a YAML indentation error at midnight, imagine that, except the YAML is English and the parser has moods.

What the plugin got in exchange for this fragility was real: distribution through the plugin marketplace, a polished harness, subagents, and users who already lived in Claude Code. I want to be fair to this body. It is not the villain of the story. It just had a ceiling, and the ceiling was made of someone else's decisions.

Last week, the platform shipped my orchestrator

In late May, Claude Code GA'd Dynamic Workflows: deterministic JavaScript orchestration of subagents as a first-class platform primitive. Loops are loops. Conditionals are conditionals. Agents are function calls that return structured output.

If you've been simulating exactly that with prose for a year, the GA announcement reads less like a feature release and more like a pardon.

The port took days, not weeks. Forge's three orchestrators — run-task, run-sprint, fix-bug — became wfl-run-task.js, wfl-run-sprint.js, and wfl-fix-bug.js. We tracked parity against the prose originals as a numbered checklist and closed the gaps one by one. Verdict extraction became structured output with a schema instead of an act of faith. The revision loop became a for loop with a counter that cannot be talked out of its limit.

Here's the ported orchestrator running — wfl:run-task dispatching all eight phases inside Claude Code, recorded on a real task and compressed only in wall-clock time. The pipeline ran for about 20 minutes; chapter markers on the seek bar jump to any phase:

{
  "src": "/blog/stories/who-owns-the-context-window/part-1-same-brain-two-bodies/part1-run-task-claude.cast",
  "poster": "npt:0:12",
  "theme": "dracula",
  "rows": 30,
  "cols": 100
}

Body one: the Forge plugin driving plan-through-commit via Claude Code's Dynamic Workflows — the orchestrator that was prose until last week, now a deterministic wfl:run-task dispatch.

I wrote about Dynamic Workflows when it landed — including the security surface it creates — so I'll spare you the full tour. What matters for this series is what the port felt like, because the feeling is data.

This is what platform absorption feels like when it goes well: you delete your worst code and say thank you. The platform shipped a primitive, the primitive arrived above the line — as a hook I could stand on — and my hand-rolled simulation of it became unnecessary in the best possible way. Sherlocked, and glad about it.

Remember that feeling. The series will come back to it when absorption points at something I didn't want to lose.

The second body: forge-cli, built on pi

About the time the prose orchestrators were at their most temperamental, I started building the second body: forge-cli — the same Forge brain, rebuilt as a standalone coding harness.

It runs on pi, Mario Zechner's deliberately minimal coding agent. pi is an unusual foundation choice and a deliberate one. Its system prompt and tool definitions fit in under a thousand tokens — roughly a tenth of what mainstream harnesses carry. It ships four tools: read, write, edit, bash. Its philosophy is that frontier models already know what a coding agent is, so the harness should get out of the way — and, critically, it has an extension API that hands over the keys instead of hiding them.

On that foundation, forge-cli is everything the plugin couldn't be:

  • Orchestration as code, from day one. run-task and run-sprint are TypeScript, with preflight checks and a halt advisor that turns gate failures into structured, recoverable state. The plugin caught up last week; forge-cli started here.
  • A hook on every tool call. A dispatcher sits in the path of every tool_call and tool_result, which means forge-cli can see — and shape — every byte that flows toward the model. Hold that thought.
  • Models routed per phase, across providers. Forge's personas map to model tiers: Heavy for review and sign-off (supervisor, architect, QA), Standard for planning and implementation (engineer), Light for writeback and indexing (collator, librarian). Each tier binds to any provider pi resolves — Anthropic, OpenAI, Ollama, OpenRouter. Your architect can be a frontier model while your librarian is a local one, and your invoice will notice the difference.

Here's the second body running the same pipeline you watched above — same eight phases, same gates, but this time in forge-cli's own dashboard on pi, with the per-phase token meters visible. The original run took 18 minutes; the replay compresses each phase to a few seconds:

{
  "src": "/blog/stories/who-owns-the-context-window/part-1-same-brain-two-bodies/part1-run-task.cast",
  "poster": "npt:0:10",
  "speed": 1,
  "pauseOnMarkers": false,
  "theme": "dracula",
  "rows": 30,
  "cols": 100
}

Body two: the same task on forge-cli — 18 minutes of plan-through-commit replayed in 80 seconds. Both replays are real runs; in both, only wall-clock time was compressed.

Same pipeline. Same store. Same personas, gates, and escalation discipline. Same brain. Different body — and the difference is concentrated in exactly one place.

What converged, and what didn't

Put the two bodies side by side and something interesting shows up. Almost every architectural difference between them is shrinking:

Plugin, a month ago Plugin, today forge-cli
Orchestration Prose, LLM-executed JS on Dynamic Workflows ✅ TypeScript, day one
Verdict gates Hoped-for extraction Structured output ✅ Deterministic parse
Store discipline Convention Convention + workflow guards Enforced at the tool layer
Model per phase Harness default Partial, via workflow overrides Tier-routed, any provider
Tool output handling The harness's business Still the harness's business Hooked, trimmed, governed
Context window Opaque Still opaque Mine

Read that table from top to bottom and you're reading a platform doing what good platforms do: noticing what builders hand-roll and shipping it as a primitive. Orchestration converged because Claude Code shipped a hook — Dynamic Workflows — and convergence-by-hook is a gift. I have no complaints. I have the opposite of complaints.

Now read the bottom two rows.

Tool output handling and the context window did not converge, and there is no announced path by which they will. In the plugin, what my agents see each turn — which store fields survive, whether that 40,000-token bash dump gets trimmed, what gets silently compacted when the window fills — is decided by the harness, behind a curtain. There is no hook where Forge can stand and say: this agent is a QA engineer in the validate phase; it needs acceptance criteria and test output; it does not need the traversal trace of a store query it ran four turns ago.

In forge-cli, that sentence is roughly a config entry.

Why this is the row that matters

If you manage engineering teams rather than build agent tooling, here is the version of this that lands on your desk: an agentic SDLC pipeline reads its own project state — store queries, entity JSON, validation reports — on every turn, of every phase, of every task, of every sprint. The verbosity compounds quietly, and it compounds in three directions at once.

It compounds into cost. Every unneeded byte that enters the window is a billed token, and a six-phase pipeline multiplied across a sprint is a lot of turns. The first time I actually decomposed a month's token bill, the dominant line item wasn't model intelligence applied to my problems. It was boilerplate in transit.

It compounds into quality. A context window silted up with stale tool output doesn't just cost more — it makes the model worse. Practitioners call it context rot, and it means you can simultaneously pay more and get less.

It compounds into blindness. When context grows, rots, or gets compacted inside a harness you don't control, you find out from the model's behavior — not from any meter. You cannot govern what you cannot see, and in the plugin body, I could not see.

Those three pains — the bill, the rot, the blindfold — are, in order, the next three stops on this journey. They drove me from adopting off-the-shelf context middleware, to writing a compression library whose evals humbled me, to building a full context governor inside forge-cli. And then they drove me to a harder question: while I was building all this, the platforms were absorbing the same layer — and unlike Dynamic Workflows, that absorption is happening below the line, server-side, behind the API, where no plugin hook will ever reach.

I didn't move Forge to pi because Claude Code is a bad harness. It is an excellent harness, and last week it absorbed my orchestration layer so gracefully that I deleted a year of careful prose with something close to joy. I moved because the question I needed to ask — what should my agents see, each turn, in each phase? — wasn't askable there. It still isn't.

The first hint of how expensive that question would get arrived, as these things do, with the bill. That's Part 2.


This is Part 1 of "Who Owns the Context Window?" — a seven-part series on where context management should live, told through the building of one system that had to answer it. Forge is open source: the Claude Code plugin and forge-cli, which runs on the pi coding agent.

More Articles

Dynamic Workflows: The Confused Deputy Behind Every agent()

Dynamic Workflows: The Confused Deputy Behind Every agent()

If every side effect in a dynamic workflow is routed through an agent(), then the security of the workflow is the security of agent() — a subagent that can do anything your session permits. A follow-up that probes that surface empirically: the JS lockdown isn't a boundary, the engine adds control-model risk rather than new privilege, and two probabilistic guardrails — model injection resistance and the harness action classifier — both fired but neither is a wall.

Boni Gopalan 11 min read
Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Dynamic Workflows: A Deterministic Controller Over LLM Subagents

Claude Code's Workflow tool runs a deterministic JavaScript controller that spawns and awaits LLM subagents. This is the execution model and the sandbox contract — what the script can and can't do, verified by probing the isolate, not just reading the docs: no require, no process, no fetch, a dual-layer determinism guard, and every side effect routed through an agent().

Boni Gopalan 13 min read
Pi-Ralph Is Smaller Than It Looks and Bigger Than It Acts

Pi-Ralph Is Smaller Than It Looks and Bigger Than It Acts

Entelligentsia's pi-ralph extension fits a Generator-Critique-Judge agent loop in ~900 lines. A balanced look at what the pi harness hands you for free, what the scaffold latently affords — multi-model role assignment and dynamic tool synthesis via bash — and what production users still have to build.

Boni Gopalan 8 min read
Next Part 2 Title

About Boni Gopalan

Elite software architect specializing in AI systems, emotional intelligence, and scalable cloud architectures. Founder of Entelligentsia.

Entelligentsia Entelligentsia