AAgentic Design School
Module 2 of 6
45–55 minutes

Orchestrating Design Agent Teams

Orchestration Patterns

The three patterns that cover most real cases: fan-out for independent parallel work, pipelines for staged work with handoffs, and critic pairs where one agent produces and another reviews against the criteria.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Match fan-out, pipeline, and critic-pair patterns to task shapes.
  • Define boundaries, outputs, and merge rules before any worker starts.
  • Run an orchestrator role — human or agent — that owns the brief, the merge log, and the stop conditions.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Orchestration Patterns

Orchestrating Design Agent Teams · Module 2 of 6

  • Three patterns cover most of it: fan-out, pipeline, critic pair
  • The orchestrator role: brief, boundaries, merge log, stop conditions
  • Worker prompts and merge rules written before any worker runs
  • The failure modes, and a four-worker run traced end to end

Module 1 settled whether to split. This module is about how to run the split so the merge is a review, not a rescue.

Slide notes

Module 1 ended with a decision map: most tasks stay with one agent, and the ones that split do so because they have genuinely independent zones, parallel deadlines, or distinct expertise. This module assumes that decision has been made and answers the next question — what shape the split takes and who runs it.

Keep the framing modest. There are only three patterns worth teaching because, in practice, almost every multi-agent design setup reduces to one of them or a mix: fan-out when the work is parallel and independent, a pipeline when it is staged, and a critic pair when the point is independent review rather than parallel production. Resist the temptation to present orchestration as an architecture discipline with a dozen topologies; the extra patterns you see in vendor material are mostly these three with marketing names.

The other thing to set up front is that orchestration is mostly writing. The artifacts that decide the outcome — the orchestration brief, the worker briefs, the shared constraints, the merge log — are plain files. The product surfaces that schedule and message between agents change quarterly; the files transfer across all of them. That is also why this module's worked example is quoted from real run artifacts rather than described from memory.

Narration for this slide

Welcome back. In Module 1 you decided whether a task genuinely needs more than one agent. This module is about what happens after you say yes. We are going to cover the three patterns that handle most real cases — fan-out for parallel independent work, pipelines for staged work with handoffs, and critic pairs where one agent produces and another reviews against written criteria. We will define the orchestrator role properly, write worker prompts and merge rules before anything runs, look hard at the failure modes, and trace a real four-worker run end to end, with its costs. Let's start with fan-out.

Slide 2 of 1316:9

Fan-out: parallel zones, one merge point

Fan-out splits work into independent zones, runs one worker per zone, and brings everything back through a single merge.

  • One orchestrator brief states the user job, ownership boundaries, known overlaps, and merge gates
  • Each worker owns one zone and writes exactly one output — a spec, a finding file, a component
  • Workers do not read or edit each other's outputs; isolation is the point
  • Everything converges at one merge review, which is where the design decisions actually get made
  • Scales well when reading volume, not judgment, is the bottleneck — audits, surveys, multi-surface specs

Fan-out buys parallelism and pays for it at the seams — the places where two zones meet and nobody owns the join.

Slide notes

Fan-out is the pattern people picture when they hear multi-agent: many workers running at once, each on its own slice. The orchestration article on this site calls the underlying decomposition spatial — split by place: pages, bands, dashboard panels, canvas regions, component folders. Each worker gets one zone and one owned output file, which is what keeps the merge tractable. The merge point is singular on purpose; multiple partial merges are how inconsistencies survive into the final artifact.

The clearest production example is the design-system audit workflow: one agent per component folder, each comparing its folder against the documented tokens and returning structured findings, with an orchestration script collecting results outside the conversation context. In the workflow's traced case, 64 agents covered a 60-component system in about 70 minutes and returned 312 findings — work that takes weeks by hand. That case also shows where fan-out earns its keep: the per-zone work is reading and comparison, the judgment work is concentrated in the report review afterwards.

The cost is the seams. Spacing rhythm drifts between zones, and two zones express the same signal two different ways — in the five-surface run traced later in this module, two workers proposed two different treatments for the same last-reviewed signal, both individually defensible. Fan-out does not remove that class of problem; it converts it into something the merge review is explicitly looking for, which is why the merge gates are written before any worker runs.

Narration for this slide

The first pattern is fan-out. You split the work into zones that genuinely do not depend on each other, give each zone to one worker with one owned output, and bring everything back through a single merge review. The clearest production example is a design-system audit: one agent per component folder, sixty-odd agents in parallel, findings collected into one drift report in about an hour. Fan-out is the right shape when the bottleneck is reading volume rather than judgment. The price is the seams — the places where two zones meet, where spacing drifts and the same signal gets expressed two different ways. The merge review exists to catch exactly that, which is why you write its rules before anyone starts.

Slide 3 of 1316:9

Pipelines: staged work with gates between stages

A pipeline splits by stage rather than by zone: each stage's output is the next stage's input, and every handoff is a gate.

  • Stages run in sequence — spec, build, verify — so parallelism is low by design
  • Merging is nearly free: there is nothing to reconcile, only the next stage to feed
  • Every handoff is a natural review point with an owner and an evidence requirement
  • Stages can be different agents, different models, or different people — the contract is the artifact passed between them
  • Production evidence: a 13-role book pipeline whose failures clustered at the export boundary, not in the drafting stages

Pipelines buy checkpoints and pay in wall-clock time. Choose them when the boundaries are stages, not places.

Slide notes

The pipeline is the least glamorous pattern and often the most useful one for design work that ends in a deliverable: a spec stage, an implementation stage, a verification stage, each handing a concrete artifact to the next. It barely parallelises, because stages wait on each other, but it merges almost for free — there are no competing versions of the same decision, only the question of whether each stage's output is good enough to feed the next.

What it buys is gates. Every handoff is a place where a human or an executable check can look at the artifact before more work is stacked on top of it. The traced production example from this school's case-study article is a book pipeline with thirteen agent roles defined in one configuration file — architect, researcher, drafter, reviewer, reviser, consistency checker, designer roles, assembler, exporter — with the review threshold and iteration cap in config. Its fix commits concentrated almost entirely at the export and publication boundary: PDF rendering, pagination, publish guardrails. The drafting stages, protected by their review loop, were never the problem. That is the general lesson: in pipelines, defects enter at the handoffs and at the boundary where the artifact leaves the pipeline, so that is where the gates belong.

The failure mode is propagation. A weak early stage feeds every later stage, and if the gates rubber-stamp — approve because the artifact arrived, not because it met criteria — the pipeline becomes a slow way to produce the same mistakes a single agent would have made faster. Module 5 takes this pattern to its full length as the canvas-to-production pipeline; here the goal is just to recognise when staged work is the honest shape of the task.

Narration for this slide

The second pattern is the pipeline. Instead of splitting by place, you split by stage: a spec stage, a build stage, a verify stage, each handing its output to the next. Pipelines barely parallelise — stages wait on each other — but they merge almost for free, because there is nothing to reconcile. What you are really buying is checkpoints: every handoff is a gate with an owner. The production evidence is telling. A thirteen-role book pipeline we trace in the articles had its review loop protect the drafting stages from day one — and nearly every fix landed at the export boundary instead. Gates belong at the handoffs. The failure mode is propagation: a weak early stage feeds everything after it, and a gate that rubber-stamps is not a gate.

Slide 4 of 1316:9

Critic pairs: a producer and an independent reviewer

One agent produces against the brief; a second, read-only agent reviews against explicit criteria. The pair loops until the criteria pass or the iteration cap is hit.

  • The critic is read-only and forbidden from proposing fixes — findings only, each tied to a criterion and evidence
  • Criteria are written before the first draft: severity levels, the constraint violated, the evidence line
  • An iteration cap and a pass threshold stop the loop from running forever
  • Independence is the point — the critic did not write the work, so it is not grading its own homework
  • The critic's pass is evidence for the human gate, not a replacement for it

A critic pair is a design crit made repeatable: the reviewer never grabs the pen, and every finding points at a criterion.

Slide notes

The critic pair is the smallest orchestration pattern and the one most teams should try first, because it works inside a single session as easily as across two. The producer drafts against the brief; the critic reads the output, the brief, and the constraints, and returns findings — severity, the constraint violated, the evidence — without editing anything. The producer revises against the findings, and the loop runs until the criteria pass or a cap is reached.

Two production examples ground this. In the five-surface orchestration run traced later in this module, the fifth worker was a read-only QA agent whose brief explicitly forbade proposing fixes; it caught both cross-worker conflicts as P1 findings, plus the gaps no brief had assigned. In the book pipeline, the producer-critic loop is parameterised in configuration: a chapter must score at least seven from the reviewer, with at most three revision passes. Putting the threshold and the cap in config makes the editorial standard a reviewable diff rather than a vibe.

The discipline that makes the pattern work is the read-only restriction. The moment the critic starts editing, you have two producers and no reviewer, and the independence that justified the second agent is gone. The other discipline is criteria: a critic told to review for quality produces essays; a critic given the design system's rules, the brief's review criteria, and a severity scale produces findings you can act on. And the critic's pass is never the ship decision — it is evidence presented at the human gate, the same way executable checks are.

Narration for this slide

The third pattern is the critic pair. One agent produces; a second agent reviews — and the reviewer is read-only. It returns findings, each tied to a criterion you wrote down before the first draft existed, with severity and evidence, and it is forbidden from fixing anything itself. The producer revises, and the loop runs until the criteria pass or you hit the iteration cap. In the book pipeline this is literally configuration: a pass score of seven, a maximum of three revision rounds. In the five-surface run we trace later, the read-only QA worker caught both conflicts and the gaps nobody owned. The rule that keeps the pattern honest: the critic never grabs the pen, and its pass is evidence for your decision — not the decision itself.

Slide 5 of 1316:9

The three patterns, side by side

Same building blocks — bounded workers, written contracts, gates — arranged three different ways.

Three orchestration patterns shown side by side. Fan-out with merge: an orchestrator brief with boundaries, overlaps, and gates dispatches three parallel workers, each owning one zone and one output, converging into a single merge review that resolves against the system and logs rejected options; use it when zones are independent and reading volume is the bottleneck; it fails when seams drift, signals duplicate, and the merge accepts everything. Sequential pipeline with gates: a spec stage feeds a build stage feeds a verify-and-ship stage, with an owner-approval gate and a checks-pass gate between stages; use it when each stage's output is the next stage's input and checkpoints matter; it fails when a weak early stage propagates and gates rubber-stamp. Generator-critic pair: a producer drafts against the brief and a read-only critic returns findings tied to explicit criteria with an iteration cap, ending at a human gate before shipping; use it when quality criteria can be written down and review must be independent; it fails when criteria are vague, rounds run unbounded, or the critic starts editing the work.
Fan-out buys parallelism and pays at the seams; the pipeline buys checkpoints and pays in wall-clock time; the critic pair buys independent review and pays in iteration rounds. Each panel carries its own use-when and failure mode.

Mixing patterns is normal: the worked example later in this module is fan-out for four workers plus a critic layered across all of them.

Slide notes

Walk the diagram left to right and keep the comparison honest rather than promotional. Fan-out: one orchestrator contract at the top, parallel workers in the middle, one merge review at the bottom. Point out that the merge card is green — it is a decision step, not a collation step — and that the failure strip names the two ways it goes wrong: seams and duplicated signals on the work side, and merge-by-concatenation on the review side.

The pipeline column is deliberately vertical and slow-looking. The gates between stages are ink-coloured because they are the same kind of human-or-check gate the fundamentals course introduced; the point of the pattern is that every handoff gets one. The critic-pair column shows the loop: draft across, findings back, criteria and an iteration cap underneath, and a human gate before anything ships. The green output cards in all three columns make the same point — every pattern ends at a human decision, whatever the topology above it looks like.

Close on the mixing note. Real runs combine these: the five-surface case study is spatial fan-out for its four surface workers plus a functional, read-only critic across all of them, and a canvas-to-production pipeline often contains a critic pair inside its verify stage. The patterns are vocabulary for designing a run, not boxes a run must fit inside.

Narration for this slide

Here are the three patterns side by side. On the left, fan-out: one orchestrator brief, parallel workers each owning a zone, and a single merge review where the real decisions happen. In the middle, the pipeline: spec feeds build feeds verify, with a gate at every handoff — slow on purpose, because the checkpoints are the value. On the right, the critic pair: a producer and a read-only critic looping against written criteria, with an iteration cap, ending at a human gate. Notice each panel carries its own failure mode — seams and lazy merges, propagation and rubber-stamp gates, vague criteria and critics that start editing. And notice you can mix them. The worked example later is fan-out plus a critic layered on top.

Slide 6 of 1316:9

Choosing the pattern: what each one buys and costs

The choice follows the boundaries of the task: places, stages, or criteria.

Fan-outPipelineCritic pair
Split byPlace: zones, surfaces, componentsStage: spec, build, verifyRole: producer vs reviewer
ParallelismHighLow — stages wait on each otherNone — it is a loop
Merge costHighest: seams and duplicated signalsNear zero: outputs feed forwardNone: one artifact, many findings
Typical failureMerge by concatenationWeak early stage propagatesVague criteria, unbounded rounds
Pick it whenReading volume is the bottleneckHandoffs and checkpoints matter mostIndependent review is the point

If you cannot say which row of this table your task lives in, the decomposition is not done — go back to the Module 1 decision map.

Slide notes

This table compresses the decomposition guidance from the orchestration article into a slide-sized decision aid. The split-by row is the one to anchor on: the right pattern is usually obvious once you can name where the natural boundaries of the task are. If the boundaries are places — pages, components, canvas regions — fan-out. If the boundaries are stages — something has to be specified before it can be built before it can be verified — pipeline. If there is really only one artifact and the question is whether it meets the bar, critic pair.

Be explicit about the merge-cost row, because it is the one people underestimate. Fan-out's merge is a real design review with conflicts to resolve; budget for it the way the worked example later does — it took roughly a fifth of the run's total time. The pipeline's merge cost is near zero but its wall-clock cost is the highest, because nothing overlaps. The critic pair has no merge at all but pays in rounds, which is why the iteration cap exists.

The last row doubles as a sanity check on the Module 1 decision. Tasks that resist every row — tightly coupled taste work, one shared file, no clear owner — are the tasks that should not have been split in the first place. The functional decomposition variant from the article, splitting by skill rather than place, is deliberately absent as a column: it merges worst of all and is best used the way the case study uses it, as a single read-only critic layered over a fan-out rather than as a team of writing specialists.

Narration for this slide

Here is the choice in one table. Read the first row and most decisions make themselves: if the boundaries of your task are places — surfaces, components, regions — fan out. If they are stages, where one thing must exist before the next can start, build a pipeline. If there is one artifact and the question is whether it meets the bar, run a critic pair. The row people underestimate is merge cost. Fan-out's merge is a genuine design review and it takes real time. The pipeline barely merges but barely parallelises. The critic pair never merges but pays in rounds, which is why you cap them. And if your task does not fit any row cleanly, that is the signal to go back to Module 1 and not split at all.

Slide 7 of 1316:9

The orchestrator role: a contract, not a contributor

Whether the orchestrator is you or a lead agent, its deliverables are the same four artifacts — and none of them is design content.

  • The brief: user job stated once, the deliverable, and one-sentence ownership boundaries per worker
  • Named overlaps: the conflicts the merge should expect, written down before they happen
  • The merge log: agreements, conflicts with decisions and rejected options, unowned gaps
  • Stop conditions: gates, iteration caps, and the point at which the run falls back to one agent
  • The documented failure: the lead starts doing the workers' jobs itself — structural discipline fixes it, not willpower

If the orchestrator is producing design content outside the contract and the merge log, the decomposition is being bypassed.

Slide notes

The orchestrator is closer to a design lead than to a prompt writer. It decides the decomposition, writes the contract every worker reads, sets the gates work must pass, and runs the merge review where separate outputs become one coherent design again. That is true whether the role is played by a person in one session or delegated to a lead agent on a team surface — the artifacts are identical, which is what makes the role portable across platforms.

Walk the four deliverables. The orchestration brief states the user job once so no worker reinvents it, points everyone at the same shared constraints file, and gives each worker a one-sentence boundary. Named overlaps matter more than they look: the five-surface run deliberately left one in — every surface needed a position on the same freshness signal — so the merge arrived expecting that conflict instead of being surprised by it. The merge log is the durable output: decisions, rejected options with reasons, and gaps logged as open questions rather than answered on the spot. Stop conditions include the fallback path — if workers keep colliding or the merge starts looking like a rescue, hand the briefs, constraints, surviving outputs, and merge log to a single agent and let it finish in one context.

The canonical failure is documented in the agent-teams guidance and reproduced in the traced run: the lead starts implementing instead of delegating. In the run, the orchestrating session caught itself drafting the hero spec before the worker pass had run and discarded the text. The fix is structural: the orchestrator's outputs are the contract and the log, full stop. As of June 2026, Claude Code's experimental Agent Teams surface adds product support for this role — a shared task list, messaging, and plan approval — but the role exists with or without it.

Narration for this slide

Every pattern needs an orchestrator, and the role is easy to get wrong. The orchestrator — whether that is you or a lead agent — owns four things. The brief, which states the user job once and gives each worker a one-sentence boundary. The named overlaps, so the merge expects its conflicts instead of being surprised by them. The merge log, where decisions and rejected options get written down. And the stop conditions — the gates, the iteration caps, and the point where you fall back to a single agent. What the orchestrator does not do is design content. The documented failure mode, in the product docs and in our own traced run, is the lead starting to do a worker's job itself. The fix is structural: if the orchestrator is producing anything beyond the contract and the log, your decomposition is being bypassed.

Slide 8 of 1316:9

Worker prompts: one job, one boundary, one reviewable output

A worker brief from the traced five-surface run. Most of its text is about what the worker must not do.

worker-briefs/02-articles-index-card-grid.md (from the run)
# Worker 2: Articles Index Card Grid Spec

Scope: the card grid only — the repeated card and the grid that holds it.
Do not specify the index hero, the article page, or the newsletter band.
Do not write code.

Inputs: DESIGN.md, ../shared-constraints.md, your boundary in the
orchestration brief.

Output: outputs/02-articles-index-card-grid-spec.md with structure,
hierarchy, states, responsive behaviour, and a do-not list.

Acceptance criteria:
- Builds on AccentCard and ArticleCard; badges use SchoolBadge.
- No new colours; no nested cards; radius within the 8px ceiling.
- Long-title and missing-summary behaviour is explicit.

Design direction is absent on purpose: taste lives in the design system and the shared constraints, not in each worker's brief.

Slide notes

The anatomy of a worker brief does not grow with the team — only the count does. Scope with explicit exclusions, the inputs the worker may read, one task, one owned output file, and acceptance criteria the orchestrator can check in a minute or two without redoing the work. At two workers a boundary violation is an awkward overlap; at five, a worker that wanders re-decides things three other workers own and the merge inherits a tangle, which is why most of this brief's text is exclusions.

Point out what is missing: design direction. The brief does not say what the card should look like or which fields matter most. Taste lives in DESIGN.md and in one shared constraints file every brief points at and no brief restates — duplicating constraints into each brief is how drift starts, because one copy gets edited and four do not. That separation also keeps briefs cheap; the five briefs in the traced run took roughly a quarter of an hour between them, and cheap briefs matter because the alternative is skipping them.

Two more notes from production. First, shared constraints reduce violations but do not eliminate them: one worker in the run introduced an off-token tint anyway and was sent back at the gate, which is what the acceptance check is for. Second, a critic worker's brief has a behavioural boundary rather than a spatial one — it owns nothing, reads everything, and is forbidden from proposing fixes — and that restriction is what keeps its review independent. On platforms with subagent definitions, the worker brief and the worker type are where the same restrictions get enforced mechanically: read-only tools for critics, one writable path for producers.

Narration for this slide

Here is a real worker brief from the traced run, and notice how much of it is about what not to do. Scope with explicit exclusions. The exact inputs the worker may read. One task, one owned output file, and acceptance criteria the orchestrator can check in a couple of minutes. What is missing is design direction — on purpose. Taste lives in the design system and in one shared constraints file that every brief points at; the brief carries the boundary and the deliverable. That keeps briefs cheap, and cheap briefs are the ones that actually get written. One honesty note: constraints reduce violations, they do not eliminate them. One worker in this run invented an off-token tint anyway. The acceptance check at the gate is what caught it.

Slide 9 of 1316:9

Merge rules and conflict handling, decided up front

Gates invented after the outputs arrive bend around whatever came back. Write the merge protocol before any worker runs.

  • Ownership audit first: did every worker stay inside its boundary, and what arrived that two workers both claim
  • Classify conflicts: token-semantics, seam and alignment, duplicated signal, unowned gap
  • Resolve against the design system and the user job — never against whichever worker wrote the most
  • Record rejected options and the reason, so decisions are not relitigated next run
  • Log unowned gaps as open questions for a human; inventing answers mid-merge is how scope re-enters

A merge produced by concatenation — accept everything because nothing is individually wrong — is how multi-agent design work fails while looking like it succeeded.

Slide notes

The merge is the part of the run you cannot delegate away, and at team scale it has a shape worth following in order. First the ownership audit, which is mechanical: list every place a worker specified another worker's surface. Second, classification, because conflicts at this scale fall into recognisable classes — token-semantics conflicts, where two surfaces use the system's colours or type roles to mean different things; seam conflicts, where spacing or band order breaks where two owned regions meet; duplicated-signal conflicts, where the same information is expressed twice in competing ways; and unowned gaps, the decisions nobody was assigned and therefore nobody made. Third, resolution against the constitution rather than the contributors. Fourth, a written record of what was rejected and why.

In the traced run the merge log resolved both conflicts in about fourteen minutes, and the reasoning is worth quoting because it shows what resolving against the system means: the lab-green freshness pill was rejected on token semantics — the design system reserves green for workflow and proof cues and assigns badges to yellow — not on quality. The hero's card strip was kept as a placement decision but stripped of its own card anatomy, because a component definition has exactly one owner. Both rejected options are recorded with reasons, which is what stops the next session relitigating them.

The mechanical parts of this can move into automated gates: a completion hook or CI check can verify that a worker touched only its owned file, grep for raw colour values, and require the do-not section. What cannot be automated is the resolution itself — choosing between the green pill and the yellow badge is a design decision with a reason attached, and the reason is the deliverable. Automate detection and discipline; keep judgment in the merge log, with a name on it.

Narration for this slide

Now the merge, and the rule that matters most: write the merge protocol before any worker runs, because gates invented after the outputs arrive bend around whatever came back. The protocol has an order. Audit ownership first — who stayed inside their boundary. Classify the conflicts: token semantics, seams, duplicated signals, unowned gaps. Resolve against the design system and the user job, never against whichever worker wrote the most or answered last. Record what you rejected and why. And log the gaps nobody owned as open questions instead of inventing answers on the spot. The cheapest way to fail here is merge by concatenation — accepting everything because nothing is individually wrong. That is how multi-agent work fails while looking like it succeeded.

Slide 10 of 1316:9

Failure modes: how orchestrated runs actually go wrong

Almost all of these are boundary and review failures, not tool failures — and every one appeared either in the traced run or in the platform documentation.

  • Racing edits: two workers with write access to the same file or component definition
  • Divergent assumptions: workers answer the same question differently because the overlap was never named
  • Silent scope growth: a worker wanders past its boundary and re-decides things other workers own
  • The lead does the work: the orchestrator starts drafting instead of delegating
  • Merge by concatenation, and teams left running after the work is done — idle workers keep consuming tokens

Each failure has a written counterpart: ownership per file, named overlaps, exclusions in every brief, contract-only orchestrator output, a merge protocol, and a shutdown step.

Slide notes

Run through these as a pre-flight checklist rather than a horror list, because each one maps to an artifact already covered in this module. Racing edits are prevented mechanically: one owned output per worker at spec scale, and git worktrees — an isolated checkout per worker — once outputs become edits to shared code or canvas files. Briefs decide who may change what; isolation makes sure even a worker that ignores its brief cannot quietly overwrite someone else's work.

Divergent assumptions are the conflict class the traced run manufactured deliberately: two workers each proposed a freshness treatment, both defensible, jointly incoherent. Naming the overlap in the orchestration brief converts that from a surprise into an expected comparison. Silent scope growth is the hero worker specifying card anatomy that belonged to the grid worker — caught because the QA critic was reading for exactly that. The lead doing the work is documented in the agent-teams guidance and reproduced in the run; the structural fix is that the orchestrator's only outputs are the contract and the log.

The last two are operational. Merge by concatenation was covered on the previous slide. Leaving the team running matters on hosted surfaces because, per the platform cost guidance as of June 2026, token use scales with team size and idle teammates keep consuming until the team is shut down. Add the quiet one from the article: no baseline, so nobody can say afterwards whether the orchestration bought anything. The worked example next exists precisely because the run kept one.

Narration for this slide

Here is how these runs actually fail, and notice none of it is exotic. Two workers racing on the same file — prevented by one owned output each, and worktrees once specs become edits. Divergent assumptions — two workers answering the same question differently because nobody named the overlap. Silent scope growth — a worker wandering into someone else's surface. The lead doing the work itself instead of delegating, which is documented in the platform docs and happened in our own run. Merge by concatenation. And teams left running after the work is done, quietly burning tokens. Every one of these has a written counterpart you have already seen in this module: boundaries, named overlaps, exclusions, a contract-only orchestrator, a merge protocol, and a shutdown step.

Slide 11 of 1316:9

Worked example: a four-worker fan-out, traced end to end

A five-surface redesign-spec run on this school's own site: four surface workers, one read-only QA critic, one orchestrator — run twice, once orchestrated and once as a single-agent baseline.

Single-agent baselineOrchestrated run (4 workers + QA)
Wall clock~28 minutes, one pass~77 min sequential; ~46 min if parallel (estimate)
Setup~3 minutes~22 min: contract, constraints, five briefs
Conflicts surfacedNone — one context decided implicitly2 resolved on the record, 2 unowned gaps logged
Independent reviewNoneRead-only QA pass: two P1s, one P2, two P3s
Tokens~1× (estimate)~3.5–4× (estimate)

The tax bought a decision record, conflicts caught before implementation, and reusable briefs — not speed. Speed only arrives when workers genuinely run in parallel.

Slide notes

This run is quoted from the orchestration article, which executed it on scratch files specifically so it could be traced honestly. The shape: four spatial workers — home hero, articles card grid, article header, newsletter band — plus a fifth functional worker, the read-only QA critic, and an orchestrator that wrote the contract, the shared constraints, the five briefs, and then ran the merge. One overlap was left in deliberately: every surface needed a position on the same last-reviewed signal.

Be upfront about the two honesty notes the article itself carries. The worker passes ran sequentially because the drafting environment could not spawn parallel workers, so the 46-minute parallel figure is an estimate of what the middle phase would compress to, not a measurement. And token figures are estimates, because per-pass accounting was not available; the published external figure for scale — Anthropic reporting multi-agent research systems at roughly fifteen times the tokens of a single chat — is a research-workload number quoted for context, not a design benchmark.

What the extra cost bought is the substance of the comparison. The baseline produced a perfectly usable combined spec faster, and decided the freshness rule implicitly along the way — but it surfaced no conflicts, graded its own work, thinned out state coverage on two surfaces, and left no record of rejected options. The orchestrated run produced two conflicts resolved on the record, an independent QA pass, two unowned gaps logged for a human, one worker revision caught at the gate, and a set of briefs reusable next time these surfaces change. Whether that is worth three to four times the tokens depends on how much the surfaces matter and how many people touch them next — which is exactly the Module 1 question, now with numbers attached.

Narration for this slide

Let's trace a real run. Five surfaces on this school's own site: four workers each owning one surface, a read-only QA critic across all of them, and an orchestrator that wrote the contract, the constraints, and the briefs, then ran the merge. The same brief was also run once with a single agent as a baseline. The baseline took about twenty-eight minutes and produced a usable spec. The orchestrated run took about seventy-seven minutes as executed — roughly forty-six if the workers had genuinely run in parallel, and that is an estimate — at three and a half to four times the tokens. So what did the tax buy? Two conflicts surfaced and resolved on the record before implementation, an independent QA pass, two gaps nobody owned logged for a human, and reusable briefs. Not speed. The honest conclusion: orchestration pays in decision quality and records first, and in speed only when the work truly runs in parallel.

Slide 12 of 1316:9

Exercise: choose and sketch a pattern for one large task

Take one genuinely large design task from your backlog — one that passed the Module 1 decision map — and design the run on paper. Do not run it yet.

  • Name the pattern — fan-out, pipeline, critic pair, or a stated mix — and the boundary type that justifies it
  • Write the orchestration brief: user job in two sentences, one-sentence ownership boundary per worker, named overlaps
  • Draft one worker brief in full: scope with exclusions, inputs, one output, acceptance criteria
  • Write the merge rules: conflict classes you expect, what resolves them, where rejected options get recorded
  • Set the stop conditions: gates, iteration caps, and the trigger for falling back to a single agent

Keep the pages. Module 3 reuses this sketch when the workers stop being prompts and start being MCP-connected tools.

Slide notes

The exercise is paper-only on purpose, like the Module 1 classification exercise it builds on. The expensive mistakes in orchestration are decomposition mistakes, and they are all visible before any agent runs: a boundary that cannot be stated in one sentence, an overlap nobody named, a merge rule that does not exist yet. Sketching the run costs twenty to thirty minutes; discovering the same problems mid-run costs the coordination tax plus the rework.

Steer participants towards tasks with real surface area — a multi-surface refresh, a system-wide audit, a canvas-to-code push — because small tasks make every pattern look unnecessary, which is the correct conclusion for small tasks. The most common gap to watch for in review: people write thorough worker briefs and skip the merge rules entirely, which is the written equivalent of merge by concatenation. The second most common: the orchestration brief quietly contains design direction, which means the author is planning to do the workers' jobs while pretending to delegate them.

If running this live, have people swap sketches and attack each other's boundaries — can you find a file, component, or decision two workers would both touch? That review takes ten minutes and reliably finds the racing-edit and divergent-assumption failures before they cost anything. The sketch carries forward: Module 3 takes one of these runs and wires real tools into it over MCP, and the boundaries drawn here become tool scopes there.

Narration for this slide

Your turn. Take one large design task — something that genuinely passed the Module 1 decision map — and design the run on paper. Name the pattern and say why: what kind of boundary does the task actually have — places, stages, or criteria? Write the orchestration brief with a one-sentence boundary per worker and the overlaps named. Draft one worker brief in full, exclusions included. Write the merge rules before you are tempted to skip them. And set the stop conditions, including the point where you would fall back to a single agent. Do not run it yet. Keep the pages — in Module 3, this exact sketch gets real tools wired into it.

Slide 13 of 1316:9

Summary, and the bridge to MCP chaining

  • Three patterns cover most of it: fan-out for independent zones, pipelines for staged handoffs, critic pairs for independent review
  • The choice follows the task's boundaries — places, stages, or criteria — and mixing patterns is normal
  • The orchestrator's deliverables are the brief, the named overlaps, the merge log, and the stop conditions — not design content
  • Worker briefs are mostly exclusions; merge rules are written before any worker runs
  • The traced run's tax was real — roughly 3.5–4× tokens — and it bought decision records and caught conflicts, not speed

Module 3 puts real tools inside these patterns: chaining MCP-connected canvas, browser, ticket, and code tools so each worker's reach is scoped as deliberately as its brief.

Slide notes

Recap by connecting the slides rather than re-reading them: the three patterns are arrangements of the same building blocks — bounded workers, written contracts, gates — and the comparison table turns the choice into a question about where the task's boundaries are. The orchestrator role and the worker brief are the human side of the system; the merge protocol and the failure-mode checklist are what keep the parallel work coherent; and the worked example is the honest accounting that stops anyone selling this as free speed.

If participants take one habit from the module, make it the order of writing: contract before briefs, briefs before any worker runs, gates before any output exists. Every failure in the checklist slide traces back to skipping one of those, and every artifact involved is a plain file that survives whichever orchestration product is fashionable this quarter.

Preview Module 3 concretely. So far the workers have been agents working on files. Real design runs need them to reach tools: a canvas server to read and write design files, a browser to capture screenshots, a ticketing system to file the follow-ups, analytics to ground decisions. MCP is how those tools join the run, and chaining them raises the same questions this module answered for workers — what each connection may see, what it may change, and what happens when one link in the chain fails. The boundaries drawn in this module's exercise become tool scopes there.

Narration for this slide

Let's close. Three patterns cover most of what you will need: fan-out when the boundaries are places, pipelines when they are stages, critic pairs when the point is independent review — and mixing them is normal. The orchestrator owns the contract, the named overlaps, the merge log, and the stop conditions, and nothing else. Worker briefs are mostly exclusions, merge rules are written before anyone runs, and the traced run was honest about the bill: three and a half to four times the tokens, paid back in decision records and conflicts caught early rather than in speed. In Module 3 we put real tools inside these patterns — canvas, browser, tickets, and code connected over MCP — and the boundaries you sketched in this exercise become tool scopes. See you there.

Module transcript
Module 2, narrated slide by slide

Slide 1Orchestration Patterns

Welcome back. In Module 1 you decided whether a task genuinely needs more than one agent. This module is about what happens after you say yes. We are going to cover the three patterns that handle most real cases — fan-out for parallel independent work, pipelines for staged work with handoffs, and critic pairs where one agent produces and another reviews against written criteria. We will define the orchestrator role properly, write worker prompts and merge rules before anything runs, look hard at the failure modes, and trace a real four-worker run end to end, with its costs. Let's start with fan-out.

Slide 2Fan-out: parallel zones, one merge point

The first pattern is fan-out. You split the work into zones that genuinely do not depend on each other, give each zone to one worker with one owned output, and bring everything back through a single merge review. The clearest production example is a design-system audit: one agent per component folder, sixty-odd agents in parallel, findings collected into one drift report in about an hour. Fan-out is the right shape when the bottleneck is reading volume rather than judgment. The price is the seams — the places where two zones meet, where spacing drifts and the same signal gets expressed two different ways. The merge review exists to catch exactly that, which is why you write its rules before anyone starts.

Slide 3Pipelines: staged work with gates between stages

The second pattern is the pipeline. Instead of splitting by place, you split by stage: a spec stage, a build stage, a verify stage, each handing its output to the next. Pipelines barely parallelise — stages wait on each other — but they merge almost for free, because there is nothing to reconcile. What you are really buying is checkpoints: every handoff is a gate with an owner. The production evidence is telling. A thirteen-role book pipeline we trace in the articles had its review loop protect the drafting stages from day one — and nearly every fix landed at the export boundary instead. Gates belong at the handoffs. The failure mode is propagation: a weak early stage feeds everything after it, and a gate that rubber-stamps is not a gate.

Slide 4Critic pairs: a producer and an independent reviewer

The third pattern is the critic pair. One agent produces; a second agent reviews — and the reviewer is read-only. It returns findings, each tied to a criterion you wrote down before the first draft existed, with severity and evidence, and it is forbidden from fixing anything itself. The producer revises, and the loop runs until the criteria pass or you hit the iteration cap. In the book pipeline this is literally configuration: a pass score of seven, a maximum of three revision rounds. In the five-surface run we trace later, the read-only QA worker caught both conflicts and the gaps nobody owned. The rule that keeps the pattern honest: the critic never grabs the pen, and its pass is evidence for your decision — not the decision itself.

Slide 5The three patterns, side by side

Here are the three patterns side by side. On the left, fan-out: one orchestrator brief, parallel workers each owning a zone, and a single merge review where the real decisions happen. In the middle, the pipeline: spec feeds build feeds verify, with a gate at every handoff — slow on purpose, because the checkpoints are the value. On the right, the critic pair: a producer and a read-only critic looping against written criteria, with an iteration cap, ending at a human gate. Notice each panel carries its own failure mode — seams and lazy merges, propagation and rubber-stamp gates, vague criteria and critics that start editing. And notice you can mix them. The worked example later is fan-out plus a critic layered on top.

Slide 6Choosing the pattern: what each one buys and costs

Here is the choice in one table. Read the first row and most decisions make themselves: if the boundaries of your task are places — surfaces, components, regions — fan out. If they are stages, where one thing must exist before the next can start, build a pipeline. If there is one artifact and the question is whether it meets the bar, run a critic pair. The row people underestimate is merge cost. Fan-out's merge is a genuine design review and it takes real time. The pipeline barely merges but barely parallelises. The critic pair never merges but pays in rounds, which is why you cap them. And if your task does not fit any row cleanly, that is the signal to go back to Module 1 and not split at all.

Slide 7The orchestrator role: a contract, not a contributor

Every pattern needs an orchestrator, and the role is easy to get wrong. The orchestrator — whether that is you or a lead agent — owns four things. The brief, which states the user job once and gives each worker a one-sentence boundary. The named overlaps, so the merge expects its conflicts instead of being surprised by them. The merge log, where decisions and rejected options get written down. And the stop conditions — the gates, the iteration caps, and the point where you fall back to a single agent. What the orchestrator does not do is design content. The documented failure mode, in the product docs and in our own traced run, is the lead starting to do a worker's job itself. The fix is structural: if the orchestrator is producing anything beyond the contract and the log, your decomposition is being bypassed.

Slide 8Worker prompts: one job, one boundary, one reviewable output

Here is a real worker brief from the traced run, and notice how much of it is about what not to do. Scope with explicit exclusions. The exact inputs the worker may read. One task, one owned output file, and acceptance criteria the orchestrator can check in a couple of minutes. What is missing is design direction — on purpose. Taste lives in the design system and in one shared constraints file that every brief points at; the brief carries the boundary and the deliverable. That keeps briefs cheap, and cheap briefs are the ones that actually get written. One honesty note: constraints reduce violations, they do not eliminate them. One worker in this run invented an off-token tint anyway. The acceptance check at the gate is what caught it.

Slide 9Merge rules and conflict handling, decided up front

Now the merge, and the rule that matters most: write the merge protocol before any worker runs, because gates invented after the outputs arrive bend around whatever came back. The protocol has an order. Audit ownership first — who stayed inside their boundary. Classify the conflicts: token semantics, seams, duplicated signals, unowned gaps. Resolve against the design system and the user job, never against whichever worker wrote the most or answered last. Record what you rejected and why. And log the gaps nobody owned as open questions instead of inventing answers on the spot. The cheapest way to fail here is merge by concatenation — accepting everything because nothing is individually wrong. That is how multi-agent work fails while looking like it succeeded.

Slide 10Failure modes: how orchestrated runs actually go wrong

Here is how these runs actually fail, and notice none of it is exotic. Two workers racing on the same file — prevented by one owned output each, and worktrees once specs become edits. Divergent assumptions — two workers answering the same question differently because nobody named the overlap. Silent scope growth — a worker wandering into someone else's surface. The lead doing the work itself instead of delegating, which is documented in the platform docs and happened in our own run. Merge by concatenation. And teams left running after the work is done, quietly burning tokens. Every one of these has a written counterpart you have already seen in this module: boundaries, named overlaps, exclusions, a contract-only orchestrator, a merge protocol, and a shutdown step.

Slide 11Worked example: a four-worker fan-out, traced end to end

Let's trace a real run. Five surfaces on this school's own site: four workers each owning one surface, a read-only QA critic across all of them, and an orchestrator that wrote the contract, the constraints, and the briefs, then ran the merge. The same brief was also run once with a single agent as a baseline. The baseline took about twenty-eight minutes and produced a usable spec. The orchestrated run took about seventy-seven minutes as executed — roughly forty-six if the workers had genuinely run in parallel, and that is an estimate — at three and a half to four times the tokens. So what did the tax buy? Two conflicts surfaced and resolved on the record before implementation, an independent QA pass, two gaps nobody owned logged for a human, and reusable briefs. Not speed. The honest conclusion: orchestration pays in decision quality and records first, and in speed only when the work truly runs in parallel.

Slide 12Exercise: choose and sketch a pattern for one large task

Your turn. Take one large design task — something that genuinely passed the Module 1 decision map — and design the run on paper. Name the pattern and say why: what kind of boundary does the task actually have — places, stages, or criteria? Write the orchestration brief with a one-sentence boundary per worker and the overlaps named. Draft one worker brief in full, exclusions included. Write the merge rules before you are tempted to skip them. And set the stop conditions, including the point where you would fall back to a single agent. Do not run it yet. Keep the pages — in Module 3, this exact sketch gets real tools wired into it.

Slide 13Summary, and the bridge to MCP chaining

Let's close. Three patterns cover most of what you will need: fan-out when the boundaries are places, pipelines when they are stages, critic pairs when the point is independent review — and mixing them is normal. The orchestrator owns the contract, the named overlaps, the merge log, and the stop conditions, and nothing else. Worker briefs are mostly exclusions, merge rules are written before anyone runs, and the traced run was honest about the bill: three and a half to four times the tokens, paid back in decision records and conflicts caught early rather than in speed. In Module 3 we put real tools inside these patterns — canvas, browser, tickets, and code connected over MCP — and the boundaries you sketched in this exercise become tool scopes. See you there.