Dynamic Workflows: A Deterministic Controller Over LLM Subagents
Claude Code's Workflow tool gives you a way to run many subagents from a single program. You write a JavaScript script; the harness runs it in a sandbox; the script spawns and awaits real LLM subagents through an injected agent() function. The design rests on one clean split, and most of what follows is a consequence of it:
- The script is deterministic. Loops, counters, conditionals, fan-out — ordinary control flow. No model reasoning happens here.
- The agents are the model work. Every actual LLM call is an
agent()call from inside the script.
So a workflow is deterministic orchestration with the thinking delegated to subagents. The script decides what runs and in what order; the agents decide what to say. That division is why workflows fit jobs where the control flow has to be reliable — a migration across 200 files, a review with verify-then-commit gates, a fan-out research sweep — rather than re-derived turn by turn by a model that might lose the thread on file 140.
Before any of the contract, here is a complete workflow file — small enough to read in one breath, but with every structural part present. It reviews each file in a list across two dimensions, then verifies the findings, and returns only the confirmed ones. Read it once now; the rest of the post is a tour of why each part is shaped the way it is.
// review.workflow.js — a complete, runnable workflow
export const meta = {
name: 'review-files',
description: 'Review each file for bugs and clarity, then verify findings',
phases: [ { title: 'Review' }, { title: 'Verify' } ],
}
// JSON Schemas are the typed boundary between steps.
const FINDINGS = {
type: 'object',
properties: {
findings: {
type: 'array',
items: {
type: 'object',
properties: { file: { type: 'string' }, line: { type: 'number' }, issue: { type: 'string' } },
required: ['file', 'issue'],
},
},
},
required: ['findings'],
}
const VERDICT = {
type: 'object',
properties: { real: { type: 'boolean' }, reason: { type: 'string' } },
required: ['real'],
}
// `args` is whatever the caller passed — here, a list of file paths.
const files = args?.files ?? []
log(`reviewing ${files.length} files`)
// Stage 1 reviews a file; stage 2 verifies each finding it produced.
// pipeline = no barrier: file B can be in Verify while file C is still in Review.
const reviewed = await pipeline(
files,
(file) => agent(
`Read ${file}. Report bugs and unclear code as findings.`,
{ phase: 'Review', schema: FINDINGS, agentType: 'Explore' },
),
(review, file) => parallel((review?.findings ?? []).map((f) => () =>
agent(
`In ${file}, is this a real issue? "${f.issue}" (line ${f.line ?? '?'})`,
{ phase: 'Verify', schema: VERDICT },
).then((v) => ({ ...f, verdict: v })),
)),
)
// Plain JS owns the cross-step bookkeeping: flatten, drop skips, keep the real ones.
const confirmed = reviewed.flat().filter(Boolean).filter((f) => f.verdict?.real)
log(`${confirmed.length} confirmed`)
return { confirmed }
Everything that follows is in here: the static meta literal, the injected globals (args, log, agent, pipeline, parallel), schemas as the boundary between steps, agent() as the only door to file I/O, and ordinary JavaScript owning every bit of state the agents don't. The sections below pull each of these apart.
Most of this post is the sandbox contract: what the script can and can't do, stated as facts rather than as documentation claims, because the interesting parts were confirmed by running probe scripts inside the sandbox and recording the results. If you write workflows, the contract is what keeps you from fighting the runtime.
What a dynamic workflow is, mechanically
The Workflow tool runs in the background: the call returns immediately with a task ID, and a notification fires when the script finishes. Between those two events, the runtime is doing something specific. The script is a controller with suspension points. It runs normally until it hits await agent(...), at which point JavaScript execution suspends, a real subagent runs its own full agent loop to completion, its result resolves the promise, and the script resumes. The diagram below is one such run — a controller that fans out three agents, waits, then continues.
Two useful properties fall straight out of this shape. First, concurrency without thread management: parallel() and pipeline() fire several agent() calls and your script just awaits an array — the runtime runs up to min(16, cores−2) at once and queues the rest. Second, resume by journaling: each agent() call is recorded keyed by its (prompt, opts). On resume, the longest unchanged prefix of calls returns its cached results instantly; only the first changed or new call onward re-runs. That second property is the reason the sandbox bans wall-clock and randomness, which we'll get to.
meta and the injected globals
Read a workflow and two things look like syntax magic. Neither is.
export const meta = {
name: 'review-changes',
description: 'Review changed files across dimensions, verify each finding',
phases: [ { title: 'Review' }, { title: 'Verify' } ],
}
meta is an ordinary ES-module export, but it has to be a pure literal — no variables, function calls, spreads, or interpolation. The reason is timing: the harness reads meta by static analysis before it executes the script, to populate the permission dialog and the /workflows progress tree. It's read by looking at the code, not by running it, so anything that would require running the code is off limits.
The other apparent magic is agent(), phase(), log(), parallel(), pipeline(), workflow(), plus the values args and budget. These are not imports and not decorators — the harness places them into the script's global scope before execution. They're the bridge from the sandbox back into the harness. When you call phase('Review') or await agent(...), you are calling back into Claude Code, not into any library you imported. That matters because it tells you where the line is: everything else in the script is just JavaScript running in a box with no doors except those globals.
What's in the sandbox, and what isn't
The documentation describes "plain JavaScript, no filesystem or Node.js API access." That's accurate but coarse. A probe script — one that inspects typeof on suspected globals, attempts require(), reads process, and accesses the guarded APIs both directly and dynamically — pins down what's actually there. It spawns zero subagents and runs in about three milliseconds, so it's effectively free to run.
| Capability | Result |
|---|---|
require, module, process |
undefined — no CommonJS, no Node process object, no process.env |
fs and other Node built-ins |
unreachable — no require to load them |
fetch |
undefined — the script itself has no network |
Buffer, crypto, URL, TextEncoder, atob/btoa, structuredClone |
undefined — absent even though none of these is strictly "I/O" |
JSON, Math, Array, Map, Set, Promise, RegExp, BigInt, Intl, Proxy, Reflect, Symbol |
present — core ECMAScript |
setTimeout, globalThis |
present — async scheduling works |
new Date(0) |
"1970-01-01T00:00:00.000Z" — deterministic date math works |
new Date() (no arg), Date.now(), Math.random() |
throw — they read the clock or entropy |
agent, parallel, pipeline, phase, log, workflow, budget |
present — the injected globals |
A second probe identified the sandbox precisely: a dynamic import('lodash') fails with ERR_VM_DYNAMIC_IMPORT_CALLBACK_MISSING, a Node vm-module error. So the host process is Node, but the script runs inside a vm.Context with a stripped global object and no module loader wired up. "Not Node" is imprecise; the accurate version is that Node hosts a locked-down isolate from which the script cannot see Node.
The practical takeaway is blunt. The script is a pure deterministic controller. It cannot read a file, run a shell command, hit a network endpoint, or even load a package. Every side effect has to go through an agent() call, because subagents have the real toolbelt and the script has none.
The determinism guard
The ban on the clock and on randomness exists to protect resume. A script that returned different values on replay would invalidate the cached agent() prefix and break journaling. The guard is enforced at two layers, and both are worth knowing because they bite differently.
The first layer is static, before execution. If the literal text Date.now(), Math.random(), or new Date() appears in the source, the harness rejects the script before running it. And the scan is a substring match, not a parser: one probe was rejected purely because the string Math.random appeared inside a comment. The second layer is runtime. Reaching for the same APIs dynamically to dodge the text scan — globalThis['Date']['now']() — still throws at execution time with the determinism error. There's no smuggling them in.
The error messages tell you exactly what to do instead:
Date.now() / new Date() are unavailable in workflow scripts (breaks resume).
Stamp results after the workflow returns, or pass timestamps via args.
Math.random() is unavailable in workflow scripts (breaks resume).
For N independent samples, include the index in the agent label or prompt.
The new Date(0)-works / new Date()-throws distinction is exact: only the wall-clock-reading forms are blocked, so deterministic date arithmetic on a timestamp you already hold is fine. When you need "now," pass it in through args or stamp the result after the workflow returns in the main loop, which has a clock. When you need randomness or N independent samples, vary by index — weave the loop index into the agent() prompt or label. You get diversity without breaking replay.
Writing a correct workflow
The contract above translates into a short list of habits.
All I/O is an agent call. A workflow that "reads a config" is really a workflow whose first agent reads the config and returns it. There is no other way for data to enter the script from the outside.
Keep cross-step state in plain JS. Dedup sets, counters, accumulators, verdict routing — do them in the script. Spawning an agent to "decide which file to handle next" when a for loop would do is wasted tokens and surrendered determinism.
Use schema as the boundary between steps. Without a schema, agent() returns free text you have to parse. With one, the subagent is forced to emit a validated object and you get it ready to use. Narrowing the schema to exactly the fields the next step needs is your context curation between phases — the discipline that keeps a long pipeline from accreting noise.
Gate expensive loops on budget. budget.total, budget.spent(), and budget.remaining() track output tokens across the whole run — a shared pool covering the main loop and every agent. There's no per-agent turn introspection, so budget is the lever you have for bounding cost. It's also a hard ceiling: once spent() reaches total, further agent() calls throw.
The capability surface
The sandbox is minimal, but the injected surface is expressive. The whole vocabulary is small.
agent(prompt, opts?) spawns one subagent and returns its final text — or, with schema, a validated object. It returns null if the user skips it mid-run, so filter fan-out results with .filter(Boolean). The opts you'll actually use: schema (force structured output), model ('opus'/'sonnet'/'haiku' per-agent override), label (display name in the progress tree), phase (assign to a progress group explicitly, which you want inside parallel/pipeline to avoid racing the global phase()), agentType (use a registered custom agent like 'Explore' instead of the default), and isolation: 'worktree' (run in a fresh git worktree so parallel agents that mutate files don't collide — expensive, only for parallel writes). The key fact: the subagent has the real toolbelt — Read, Edit, Bash, Grep, WebFetch, MCP tools loaded on demand — while the script has none of it. agent() is the only door to the outside world, and it's a wide one.
parallel(thunks) runs an array of () => agent(...) thunks concurrently and is a barrier: it awaits all of them. A thunk that throws resolves to null rather than rejecting the whole call. Use it when you genuinely need every result together — a vote tally, a dedup pass, an early exit on zero results.
pipeline(items, stage1, stage2, ...) runs each item through all stages independently, with no barrier between stages: item A can be in stage 3 while item B is still in stage 1. Each stage callback receives (prevResult, originalItem, index). This is the default for multi-stage work; wall-clock is roughly the slowest single chain, not the sum of slowest-per-stage. Reach for parallel only when a stage genuinely needs every prior result at once.
workflow(nameOrRef, args?) runs another workflow inline and returns its result. The child shares this run's concurrency cap, agent counter, and token budget. Nesting is one level only — workflow() inside a child throws — so recursion has to be a JS loop over a frontier, not nested calls. phase(title) and log(message) are pure UX: they shape the progress tree and emit narrator lines, and don't gate execution. args is whatever you passed to the Workflow tool, verbatim, arriving as real JSON.
A common question is whether you can use a third-party JS library if the library itself touches nothing forbidden. The answer splits cleanly. You cannot import or require a package — there's no resolver and no node_modules. But you can inline the source directly into the script, and vendored pure JS runs fine, as long as it is pure in this sandbox's strict sense: it uses only the present globals (no Buffer, crypto, URL, TextEncoder, structuredClone), it doesn't contain the literal tokens Date.now() / Math.random() / new Date() anywhere — including comments, because of the substring grep — and it doesn't assume a module scope. In practice that admits small math, string, and parsing utilities and excludes most of npm. For anything heavier, or any real I/O, do it inside an agent(): the subagent has Bash and the full ecosystem and can return the result through a schema. The controller stays pure; its agents wield the toolchain.
Two hard limits worth holding in mind: agents are capped at min(16, cores−2) concurrent (you can submit 500 to parallel, but ~10–16 run at a time and the rest queue), and a run is capped at 1000 agents total — a runaway backstop, not a budget. Any unbounded while loop must self-bound on budget.remaining() or an explicit counter.
Three things you can build with it
The pattern under every interesting workflow is the same: a classic algorithm written in deterministic JavaScript, with an agent() dropped into the one slot that needs judgment. The script owns the structure — the queue, the population, the tally — and an agent owns the expensive cognitive step. Three examples, in rough order of how often you'd actually reach for them.
A web crawler with a frontier queue. This is the practical one. You want to gather and summarize pages relevant to a topic, following links breadth-first, without letting it run forever. The script owns the frontier queue, the visited set, and the depth and budget caps; each agent fetches one page and returns its summary and outbound links. Recursion would be the textbook approach, but workflow() can't recurse, so you express it as a frontier loop instead — which is clearer anyway.
const seen = new Set(), frontier = [args.startUrl], pages = []
while (frontier.length && pages.length < 40 && budget.remaining() > 30_000) {
const batch = frontier.splice(0, 8).filter(u => !seen.has(u))
batch.forEach(u => seen.add(u))
const fetched = await parallel(batch.map(u => () =>
agent(`Fetch ${u}. Summarize it and list outbound links relevant to "${args.topic}".`,
{ schema: PAGE, agentType: 'Explore' })))
for (const p of fetched.filter(Boolean)) { pages.push(p.summary); frontier.push(...p.links) }
}
Every line of bookkeeping is deterministic; the only cognition is "fetch and extract relevant links," which is an agent holding a real network tool. The two caps keep it bounded.
An evolutionary optimizer. When you're tuning something — a prompt, a config, a design — and there's no closed-form score, you can evolve a population across generations. The script owns selection and elitism; agents own mutation and fitness scoring. There's no Math.random() to seed diversity, so you vary by index instead: each seed and each mutation gets a distinct index woven into its prompt, which is deterministic and therefore resumable.
let pop = Array.from({ length: 12 }, (_, i) => seedVariant(i)) // diversity by index
while (budget.total && budget.remaining() > 80_000) {
const scored = await parallel(pop.map(c => () =>
agent(`Score this candidate 0-100 with justification:\n${c}`, { schema: SCORE, model: 'haiku' })
.then(s => ({ c, fit: s.score }))))
const elite = scored.filter(Boolean).sort((a, b) => b.fit - a.fit).slice(0, 4)
log(`gen best=${elite[0].fit}`)
pop = (await parallel(elite.map((p, i) => () =>
agent(`Improve this candidate, attempt ${i}:\n${p.c}`, { schema: CAND }).then(r => r.text)))).filter(Boolean)
}
The budget is the stopping criterion — no fixed generation count, just "keep evolving while there's a meaningful token budget left." Scoring runs on Haiku because it's a cheap judgment; you'd spend Opus only on the work that pays for it.
A quorum oracle for high-stakes answers. When a single model call isn't trustworthy enough — a decision where a confident wrong answer is costly — you can poll several independent agents and let deterministic code enforce a quorum, escalating a split to a tiebreaker. The voting logic lives in JS precisely because you don't want a model deciding whether a model agreed with itself.
const votes = (await parallel(Array.from({ length: 7 }, (_, i) => () =>
agent(`${args.question}\n(independent analyst #${i}, do not assume consensus)`, { schema: VOTE }))))
.filter(Boolean)
const tally = votes.reduce((m, v) => (m[v.answer] = (m[v.answer] || 0) + 1, m), {})
const [winner, n] = Object.entries(tally).sort((a, b) => b[1] - a[1])[0]
return n >= 4 ? { answer: winner, confidence: n / 7 }
: await agent(`Analysts split ${JSON.stringify(tally)}. Adjudicate.`, { schema: VOTE, model: 'opus' })
The independence is doing the real work — seven analysts told not to assume consensus, each with its index in the prompt so they're genuinely distinct runs. JS tallies; only a tie escalates to a more expensive adjudicator.
These aren't exotic; they're standard structures — BFS, a genetic algorithm, a voting protocol — with the judgment step delegated. A tournament bracket, a beam search, a red-team/blue-team loop all follow the same shape. If you can write the algorithm, you can run it here, as long as the algorithm is deterministic and the intelligence lives in the agents.
The mental model
A workflow is a deterministic JavaScript controller that spawns and awaits LLM subagents. The script reasons about control flow; the agents do the thinking. meta is read statically, so it must be a pure literal. agent() and its companions are injected globals, not imports. Execution is a controller with suspension points: the script runs, suspends at each await agent(), a real subagent runs to completion, the promise resolves, and the script continues. The sandbox is a minimal ECMAScript isolate — no require, no process, no fetch — so every side effect goes through an agent. And determinism is guarded at two layers, on the clock and on randomness, because resume depends on the script being replayable.
The deterministic shell is the feature, not a limitation to route around. Everything nondeterministic is pushed into the subagents, where it belongs — and that is exactly what makes a long workflow resumable, auditable, and cheap to reason about.