AAgentic Design School
Module 6 of 6
45–55 minutes

Orchestrating Design Agent Teams

Operating a Design Agent Team

Making the capability durable: cost visibility and budgets, permissions and audit trails, review capacity that scales with output volume, onboarding people into the system, and the governance that keeps speed from outrunning accountability.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Set up cost tracking and budgets per workflow rather than one opaque bill.
  • Define permissions, audit trails, and escalation paths for agent-driven changes.
  • Plan review capacity so human attention scales with the volume agents produce.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Operating a Design Agent Team

Orchestrating Design Agent Teams · Module 6 of 6

  • Cost: per-workflow budgets, not one opaque bill
  • Permissions and audit trails for agent-driven changes
  • Review capacity that scales with what agents produce
  • Onboarding, governance, and what to do when it goes wrong

The system is only as good as its operations. Everything the previous five modules built becomes durable — or quietly dies — in this layer.

Slide notes

This is the closing module of the course, and it deliberately contains no new orchestration patterns. Modules 1 to 5 covered when to split work, how to structure fan-outs, pipelines and critic pairs, how to chain MCP tools, how to run concurrent agents on a canvas, and how to build canvas-to-production pipelines with gates. All of that can work brilliantly for one motivated person and still fail as a team capability, because nobody owns the harness, nobody can say what last month cost, and review quietly became the bottleneck nobody planned for.

Frame the module as the difference between a demo and an operation. A demo needs one good run. An operation needs the run to be repeatable by someone other than its author, affordable at the volume the team actually produces, reviewable without burning out the reviewers, and explainable when something goes wrong. Those four properties — repeatable, affordable, reviewable, explainable — map directly onto the module's sections: harness ownership and onboarding, cost and budgets, review capacity and gates, audit trails and failure review.

Set the expectation that some of this material is unglamorous on purpose. Permissions tables and operating agreements do not demo well. They are also the parts that decide whether the capability survives its first incident, its first invoice surprise, and its first staff change.

Narration for this slide

Welcome to the final module of Orchestrating Design Agent Teams. The previous five modules gave you the patterns: when to split work, how to run fan-outs and pipelines, how to chain tools, how to share a canvas, how to build a pipeline to production. This module is about keeping all of that running once it belongs to a team rather than to you. That means cost you can see and budget, permissions and audit trails for changes agents make, review capacity that scales with the volume agents produce, and a way to onboard people without a six-week ramp. None of it is glamorous. All of it decides whether the capability lasts.

Slide 2 of 1316:9

Cost: three metering models, one decision

For a design team the metering model matters more than the sticker price, because visual iteration is heavy: screenshots in, regenerated layouts out, long design-system context.

  • Subscription windows (Claude): predictable monthly cost; heavy days hit the allowance and stall the loop
  • Credit metering (ChatGPT/Codex; Google's tiered limits): never stalls, but heavy review days cost multiples of normal ones
  • BYOK / API (OpenCode, direct API): spend tracks usage exactly — and tracking it becomes someone's job
  • The ladder as of June 2026: roughly $20 entry, $100 daily-use, $200 all-day tiers across the major vendors

None of the metering models is wrong; they fail differently. Pick the one whose failure mode your team can actually manage.

Slide notes

The figures here are date-stamped on purpose: every price referenced in this module was captured from official vendor pages on 1 June 2026, and pricing has changed at all three major vendors within the eight weeks before that date. Treat the numbers as a snapshot and the structure as the durable part. The structure is a three-rung ladder — entry tiers around $20 a month, middle tiers around $100 pitched at daily professional use, and top tiers around $200 pitched at all-day use — plus a bring-your-own-key route where you pay for the model rather than the seat.

What actually differs between vendors is metering, and orchestration multiplies whatever you choose. Subscription windows give a rolling allowance; a four-worker fan-out plus a critique agent on a heavy day exhausts it and the team waits. Credit metering never stalls, but the same fan-out costs several normal days of usage, and the surprise lands on the bill instead of on the schedule. BYOK tracks spend to actual tokens, which agencies billing clients love and finance teams without an owner for key management hate.

The operational point for this module: parallel and background agents are precisely what the middle and top tiers exist for. If the team's workflows lean on the patterns from Modules 2 to 5, budget for the tier above the one that looks sufficient on paper, and measure a normal week before committing — the meter is better evidence than any heuristic.

Narration for this slide

Start with cost, because it is the first thing leadership asks about and the last thing most teams can answer. As of June 2026, the major vendors have converged on a similar ladder: roughly twenty dollars for entry, around a hundred for daily use, around two hundred for all-day use — those numbers will drift, so verify them on the official pages. What matters more is how usage is metered. Subscription windows are predictable but stall on heavy days. Credit metering never stalls but surprises you on the bill. Bring-your-own-key tracks spend exactly, as long as someone owns the tracking. Orchestration multiplies all of this — fan-outs and critique agents are exactly what the bigger tiers exist for. Pick the failure mode you can manage.

Slide 3 of 1316:9

From one opaque bill to per-workflow budgets

The unit of cost management is the workflow, not the subscription. Each recurring workflow gets a budget, an owner, and a keep-or-cut review.

WorkflowTypical runBudget questionKeep / cut signal
Design system audit at scaleMonthly fan-out across reposCost per cycle vs a day of manual auditingFindings still being actioned next cycle
Multi-direction concept explorationPer project, 3–5 directionsCost per direction vs the decision it unblocksDirections still genuinely distinct
Visual QA / parity sweepsPer release or per PRCost per release vs defects caughtCatch rate above what review alone finds
Design ops reportingPer reporting cycleRun cost vs the day of manual assembly it replacesLeadership still reads and acts on it
Exploratory one-off runsAd hocA monthly cap, not per-run accountingAnything repeated twice becomes a workflow

A run that costs more in spend and review attention than the decision it informs is not worth repeating — cutting it is an operations decision, not a failure.

Slide notes

The shift this slide asks for is small but structural: stop reasoning about the subscription and start reasoning about workflows. A subscription is one number that nobody can act on. A per-workflow view — this is what the monthly audit costs, this is what a concept exploration costs, this is what the reporting cycle costs — turns cost into a series of small, reviewable decisions that the people running the workflows can actually own.

The budget question column matters more than precise dollar figures. For most workflows the honest comparison is against the manual alternative: the design ops report that replaces a day of screenshots and spreadsheets, the audit that replaces a week of clicking through screens. Where the comparison comes out badly — a workflow that consumes real spend and real review attention and produces findings nobody actions — the right move is to cut or redesign it, and teams need explicit permission to do that without treating it as an indictment of the whole approach.

The last row carries the discipline: exploratory runs deserve headroom, because that is where new workflows come from, but a capped pool rather than per-run accounting. The boundary rule — anything repeated twice gets promoted to a named workflow with a budget — keeps exploration from becoming the place where untracked spend hides.

Narration for this slide

Here is the practical move: budget per workflow, not per subscription. The monthly design system audit, the concept explorations, the visual QA sweeps, the ops report — each one gets a rough budget, an owner, and an honest comparison against the manual work it replaces. That comparison is the keep-or-cut signal. If the audit's findings are still sitting unactioned next cycle, or the report nobody reads costs real money and review time, cut it or redesign it — that is operations working, not the approach failing. And keep a capped pool for exploratory runs, with one rule: anything you repeat twice becomes a named workflow with a budget of its own.

Slide 4 of 1316:9

Permissions: who and what may change which surfaces

An agent that can edit anything will eventually edit the wrong thing. Permissions are scoped per surface, and they apply to agents and people alike.

  • Map the surfaces: design system repo, product repos, shared canvas files, tokens, tickets, deployed sites
  • Default deny on the high-blast-radius surfaces: tokens, shared components, anything that publishes
  • Scope MCP credentials to the least access that still works — read-only wherever read-only is enough
  • Drafts and scratch branches are open; main, the design system, and production are gated
  • The permission map is written down and versioned, not remembered

Permissions are not distrust of the agent. They are the written form of who is allowed to change what — which the team needed anyway.

Slide notes

The permissions conversation goes wrong when it is framed as a referendum on whether agents can be trusted. The more useful framing is that agentic work forces the team to write down something that was always true but rarely explicit: which surfaces are high blast radius and who may change them. A token file change propagates everywhere. A shared component change lands in every product that consumes it. A push to main or a publish to a live site is visible to users. Those surfaces were already supposed to be controlled; agents just make the volume of change high enough that informal control stops working.

In practice the controls live in three places. Platform permission systems and sandboxes decide what an agent session may touch — directory scope, allowed commands, whether it can push or publish. MCP credentials decide what connected tools allow — and most design-tool MCP servers can be connected with read-only scopes, which is enough for the majority of chained workflows from Module 3. Repository and canvas permissions decide what merges or publishes regardless of who or what proposed the change.

The operating principle is asymmetric: be generous on scratch space and drafts, because that is where iteration speed lives, and strict on the surfaces where mistakes propagate. And keep the map versioned next to the harness — a permission scheme that lives in one person's memory does not survive that person's holiday.

Narration for this slide

Next, permissions. The question is not whether you trust the agent — it is which surfaces have a high blast radius, and who or what may change them. Token files, shared components, anything that publishes to users: those propagate everywhere, so they default to deny and go through gates. Scratch branches and draft canvases stay open, because that is where speed lives. Scope MCP credentials to the least access that works — read-only is enough more often than you would think. And write the map down, versioned next to your harness. A permission scheme that lives in someone's head is not an operating model; it is a single point of failure.

Slide 5 of 1316:9

Audit trails: what changed, why, and on whose approval

When agents produce most of the volume, the question 'where did this come from?' must have a boring, checkable answer.

  • Every agent-driven change lands as a reviewable diff — branch, PR, or canvas snapshot, never silent edits in place
  • The trail records the brief or workflow that produced it, not just the file delta
  • The named approver is recorded with the change — accountability stays with a person
  • Run logs and merge logs from the orchestrator are kept with the output
  • Reports cite their computation: every figure links to the script, export, or file it came from

An audit trail is what lets you say 'this change came from this brief, was approved by this person, and here is the evidence' — months later, to someone who was not there.

Slide notes

Audit trails sound like compliance overhead until the first time someone asks where a change came from and the honest answer is an agent did it and we are not sure why. The volume agents produce is exactly what makes informal memory stop working: when one person merged three changes a week they could reconstruct the story; when workflows land thirty changes a week nobody can.

The good news is that most of the trail is infrastructure the team already has, used consistently rather than built from scratch. Diffable artifacts and version control — the foundation laid back in the design-as-code material — mean every change is already a diff with an author, a timestamp, and a reviewer. What agentic work adds is two requirements. First, the change should reference what produced it: the brief, the saved workflow command, or the orchestration run, so a reviewer can judge the change against its intent rather than guessing. Second, approval must be attributable to a person. Checks can pass automatically; accountability cannot be automated, and the approver's name on the gate is what makes the review gates from Module 5 mean something.

Reporting closes the loop: the design ops reporting workflow this module draws on requires every figure in a report to link to the script, export, or findings file that computed it. That same standard — numbers with provenance — is what makes an operations review possible at all, and it is why the worked example later in this module can be argued about productively.

Narration for this slide

Audit trails answer one question: where did this change come from? With agent volume, you cannot answer that from memory. So every agent-driven change lands as a diff — a branch, a pull request, a canvas snapshot — never a silent edit. The change references the brief or the workflow that produced it, so a reviewer judges it against intent. The approver is named, because checks can be automated but accountability cannot. And reports follow the same rule: every number links to the script or export that computed it. Months later, when someone asks why the spacing token changed, you have a boring, checkable answer. Boring is the goal.

Slide 6 of 1316:9

The operating model in one picture

Four layers and a cadence. Each layer has a named owner; the cadence is what keeps the layers from going stale.

Diagram of the operating layers of a design agent team. Four cards: harness ownership held by a named owner with instruction files, skills, and agent definitions versioned in the repo; budgets and cost tracked per workflow with runs that are not worth repeating cut; review gates where every change has a named approver, scoped permissions, an audit trail, and an escalation path; and reporting where agents assemble adoption, QA debt, and throughput figures that each link to their source and a human signs the report. A cadence band beneath connects to all four layers: weekly spend checks, per-change gates and audit entries, a signed ops report each cycle, and a quarterly review of the harness and policies.
Harness ownership, budgets and cost, review gates with accountability, and reporting — each with a named owner. The cadence band underneath is what keeps them current: weekly spend checks, per-change gates, a signed report each cycle, and a quarterly harness and policy review.

If a layer has no named owner or no slot in the cadence, it is decoration. It will be accurate for about a month and misleading after that.

Slide notes

Walk the diagram left to right, then land on the cadence band, because the band is the part teams skip. Harness ownership covers the instruction files, skills, agent definitions, and saved workflow commands the whole course has been building — versioned in the repo, changed through the same review gate as anything else, and owned by a named person or pair, typically a design lead or a platform-design pairing. Budgets and cost is the per-workflow view from the previous slides, owned by whoever actually signs the invoice. Review gates carry the permissions, audit trail, and accountability material — the gate is only real if a named reviewer holds it and an escalation path exists for when it fails. Reporting is the design ops reporting workflow: agents assemble adoption, QA debt, and throughput evidence, every figure links to its computation, and the report stays a draft until the lead signs it.

The cadence band is the difference between an operating model and a diagram of one. Weekly spend checks catch the workflow that quietly tripled in cost. Per-change gates and audit entries are continuous. The reporting cycle produces the signed artifact leadership sees. The quarterly review is where the harness, the permission map, and the policies themselves get re-examined — because the platforms, prices, and tools underneath this all churn, and an operating agreement written in January is partly wrong by June.

The ownership point bears repeating: every box on this diagram has an owner line at the bottom. That is deliberate. Shared ownership of operations reliably means no ownership.

Narration for this slide

Here is the whole operating model in one picture. Four layers. Harness ownership: the instruction files, skills, and saved workflows, versioned and owned by a named person. Budgets and cost: tracked per workflow, owned by whoever signs the invoice. Review gates: scoped permissions, audit trails, and a named approver on every change. Reporting: agents assemble the evidence, every figure links to its source, and a human signs it. Underneath sits the cadence — weekly spend checks, per-change gates, a signed report each cycle, and a quarterly review of the harness and the policies themselves. Two tests for whether this is real: every layer has a named owner, and every layer has a slot in the cadence. Miss either and it is decoration.

Slide 7 of 1316:9

Review capacity: the bottleneck moves to human attention

Orchestration multiplies output. It does not multiply the people qualified to judge it. Plan review capacity like any other constrained resource.

  • Estimate review load per workflow up front: a four-worker fan-out is four reviews plus a merge, not one
  • Tier the review: executable checks first, critic agents next, human judgment last and only on what survives
  • Match artifact size to review appetite — smaller, more frequent diffs beat one enormous drop
  • Name the reviewer when the run is planned, not when the output arrives
  • When review queues grow, reduce parallelism — more agents behind a saturated reviewer just deepens the queue

Unreviewed agent output is not capacity gained; it is risk deferred. The honest unit of team throughput is reviewed work.

Slide notes

This is the constraint the whole course has been pointing at. Module 1 listed multiplied review as part of the coordination tax; Module 2's orchestrator role owned the merge log; Module 5's pipeline put a gate at every stage. Operations is where those add up to a number: how many reviews per week the team's senior people can actually do well, and what happens when the workflows produce more than that.

The tiering is what makes the number manageable. Executable checks — token audits, type checks, accessibility scans, screenshot comparisons — run on everything and cost no human attention. Critic agents reviewing against explicit criteria filter the obvious misses, with the limits the critic-pair pattern already covered: a critic catches criteria violations, not taste. Human judgment then goes only to what survives, which is where it belongs. Teams that skip the tiering end up with senior designers spending entire days reviewing agent output, which is both a morale problem and a quality problem, because attention degrades long before the queue empties.

Two failure patterns to name explicitly. Rubber-stamping: when the queue grows, approvals get faster and shallower, and the gates from Module 5 become theatre — the audit trail will show suspiciously uniform approval times if you look. And the parallelism reflex: the instinct when output slows is to add more agents, which is exactly backwards when the constraint is the reviewer. Reducing parallelism to match review capacity is usually the change that improves both speed and quality.

Narration for this slide

Now the constraint that sneaks up on every team: review. Orchestration multiplies what gets produced, but it does not multiply the people qualified to judge it. So plan review like a resource. Estimate the load when you plan the run — a four-worker fan-out is four reviews and a merge, not one. Tier it: automated checks on everything, critic agents against explicit criteria, and human judgment only on what survives. Name the reviewer before the run starts. And watch for the two failure modes: rubber-stamping when the queue grows, and adding more agents when the real constraint is the reviewer. Unreviewed output is not capacity — it is risk you have postponed.

Slide 8 of 1316:9

Onboarding without the six-week ramp

A new designer should be productive inside the system in days, because the system itself carries most of what they need to know.

  • The harness is the onboarding: instruction files, skills, and saved workflows encode how this team works
  • Start them on bounded, low-blast-radius work: drafts, audits, scratch branches — not the design system repo
  • Pair their first runs with the workflow's owner; review their briefs before reviewing their output
  • Give them the operating agreement, the permission map, and one traced run to read on day one
  • Promote autonomy by surface, not by tenure: access to gated surfaces follows demonstrated review judgment

If onboarding takes six weeks, the knowledge is in people's heads instead of in the harness — fix the harness, not the induction deck.

Slide notes

Onboarding is the test of whether the team built a system or a collection of personal setups. If everything important lives in the harness — the instruction files, the skills, the agent definitions, the saved workflow commands, the conventions for the shared canvas — then a new designer inherits most of it by cloning the repo and reading three documents: the operating agreement, the permission map, and one traced run of a real workflow with its review notes. If the important knowledge lives in the heads of the two people who built the workflows, onboarding takes six weeks and the capability has a bus-factor problem regardless of how good the patterns are.

The sequencing matters. New people start on surfaces where mistakes are cheap: audits, explorations, scratch-branch prototypes. Their early briefs get reviewed before their outputs do, because briefing quality is the leading indicator — a designer who writes sharp briefs and sensible review criteria can be trusted with bigger surfaces quickly; one who skips the gates cannot, regardless of seniority. That is also why autonomy should expand by surface rather than by tenure: access to the design system repo or publish rights follows demonstrated review judgment, which the audit trail conveniently documents.

One caution: do not turn onboarding into tool training. The platform CLI takes an afternoon to learn. The team's standards, conventions, and judgment about what good looks like are the actual content, and the harness plus reviewed practice is how those transfer.

Narration for this slide

Onboarding is where you find out whether you built a system or a pile of personal setups. If the harness carries the team's knowledge — instruction files, skills, saved workflows, conventions — a new designer gets most of it by cloning the repo and reading three things: the operating agreement, the permission map, and one traced run. Start them on low-blast-radius work: audits, drafts, scratch branches. Review their briefs before their output, because briefing quality is the leading indicator. And expand their access surface by surface, based on demonstrated judgment, not tenure. If onboarding still takes six weeks, the problem is not the new person — it is that the knowledge never made it into the harness.

Slide 9 of 1316:9

Governance that enables vs governance that strangles

The purpose of policy is to make the safe path the fast path. If the governed route is slower than the workaround, people will take the workaround.

DecisionStrangling versionEnabling version
Approving runsEvery agent run needs sign-off in advancePre-approved workflows run freely; new surfaces and new tools need sign-off once
PermissionsBlanket bans on agent access to design filesScoped write access with gates on tokens, shared components, and publishing
Cost controlSpend frozen until a full ROI case existsPer-workflow budgets with a monthly keep-or-cut review
QualityA central committee reviews all agent outputNamed reviewers per workflow, tiered checks, audit trail spot-checks
ReportingPer-person productivity dashboardsTeam-level distributions; the report informs prioritisation, never individual evaluation

Govern surfaces and workflows, not individual runs. Review the policies on the same quarterly cadence as the harness — stale policy is how shadow usage starts.

Slide notes

Governance earns its bad reputation when it is written by people who do not run the workflows, applied per run instead of per surface, and never revisited. The strangling column is not a caricature; each row is a real policy pattern that produces the same outcome — designers route around the official path, usage goes unmonitored precisely because it is unofficial, and the organisation ends up with less visibility and more risk than if it had set enabling rules up front.

The enabling column shares one design principle: govern the things that are stable — surfaces, workflows, budgets, reviewer roles — rather than the things that happen hundreds of times a week. A pre-approved workflow with a named owner, a budget, and a gate does not need per-run permission. A new tool, a new MCP connection with write access, or a new surface does need a decision, once, recorded where the next person can find it.

The reporting row deserves an explicit pause because it is where governance most often poisons trust. The design ops reporting workflow this module draws on builds the boundary into the synthesis agent's own instructions: throughput is reported as team-level distributions, never per person, and the report informs prioritisation conversations rather than performance reviews. The moment per-person numbers appear, people optimise for the report instead of the work, and the cooperation the whole system depends on evaporates. Put that boundary in the operating agreement and in the agent definitions, so it survives staff changes on both sides.

Narration for this slide

Governance gets a bad name because most of it is written to prevent things rather than to enable them safely. The pattern that works: govern surfaces and workflows, not individual runs. Pre-approved workflows run freely; new tools and new surfaces need a decision once. Scoped access with gates beats blanket bans, because bans just create workarounds you cannot see. Per-workflow budgets with a keep-or-cut review beat spending freezes. And one line worth writing into both the policy and the agents themselves: reporting describes team-level distributions, never individuals. The moment the numbers become a surveillance tool, people optimise for the report instead of the work — and the whole system loses the trust it runs on.

Slide 10 of 1316:9

Failure review: when an agent-driven change goes wrong

Something will get through: a broken token change, a published page that should not have shipped, a parity miss in front of users. The response is a process, not a panic.

  • Contain first: revert the diff or roll back the snapshot — this is why everything lands as a reviewable change
  • Reconstruct from the trail: which brief, which run, which gate approved it, on what evidence
  • Locate the failure in the system: brief, harness rule, check, gate, or reviewer load — not 'the agent was wrong'
  • Fix the layer, not the instance: encode the recurring fix in the harness, add the missing check, resize the review load
  • Keep using the workflow unless the failure shows the surface should not have been automated at all

'The agent made a mistake' is a description, not a diagnosis. The diagnosis is which layer of the operating model let the mistake reach users.

Slide notes

The failure review exists for the same reason post-incident reviews exist in engineering: the first serious failure decides whether the team learns or retreats. Without an agreed process, the default response is either to quietly stop using the workflow — losing the capability over one incident — or to quietly absorb the failure and change nothing, which guarantees a repeat.

The sequence on the slide is deliberately ordinary. Containment is cheap precisely because of the operational choices made earlier: diffable artifacts, scratch branches, gates before publishing — reverting a merged diff or restoring a canvas snapshot is minutes, which is the strongest practical argument for those choices. Reconstruction uses the audit trail; if the trail cannot answer which brief produced this and who approved it, that gap is itself the first finding. Diagnosis is where discipline matters: the agent was wrong is where the analysis starts, not where it ends. Almost every real failure traces to a layer humans own — a brief that under-specified, a harness rule that did not exist, a check that was not run, a gate that was rubber-stamped because the reviewer was saturated, or a surface that should never have been writable by an agent in the first place.

The fix should land in the layer, and the harness is usually where it goes — the same loop the fundamentals course established, where recurring corrections get encoded rather than repeated. And close every review with the proportionality question: does this failure mean the workflow needs a better gate, or that this surface should go back to fully manual? Both are legitimate answers. Deciding deliberately is the difference between operating a system and being operated by it.

Narration for this slide

Eventually something gets through — a token change that breaks a product surface, a page published before it should have been. What matters is the response. Contain first: revert the diff, roll back the snapshot. That is why everything lands as a reviewable change. Then reconstruct from the audit trail: which brief, which run, who approved it. Then diagnose properly. The agent made a mistake is a description, not a diagnosis — the real question is which layer let it through: the brief, the harness, a missing check, a saturated reviewer. Fix that layer, encode the fix in the harness, and keep the workflow running unless the failure tells you the surface should never have been automated. Both answers are allowed. Choose deliberately.

Slide 11 of 1316:9

Worked example: an operations review of a running team

A five-squad product organisation, one platform-design pair, six repositories, and a quarterly operations review built on the design ops reporting workflow.

LayerWhat the review foundDecision taken
ReportingToken coverage 61% → 78% over two quarters; gain concentrated in the two repos the previous review fundedKeep the workflow; same counting scripts next cycle so the trend stays comparable
CostReport run ~35–40 minutes; lead's assembly time down from a day to ~2 hours per cycleBudget held; one rarely-read weekly variant of the report cut
Review capacityAudit fan-out produced findings faster than squads actioned them; backlog growingAudit moved from monthly to six-weekly; findings capped per cycle per squad
Backlog signalNine component requests older than 90 days, all from the two lowest-coverage squadsFlagged as a question, not a conclusion; follow-up found requests had stalled, two funded

Because every figure linked to its computation, the review argued about priorities instead of arguing about the numbers.

Slide notes

This example is assembled from the case studies documented in the school's design ops reporting workflow — a five-squad organisation tracking adoption quarter over quarter, and a design ops lead replacing a day of manual assembly — read here through the operations lens of this module. The figures are from those documented runs, not a benchmark; say so when presenting.

Walk the rows as the four layers of the operating model doing their jobs. The reporting layer produced the headline trend — 61 to 78 percent token coverage across six repositories over two quarters — and because every number linked to the counting script and the per-repo JSON, the squad leads argued about what to fund next rather than disputing the measurement; the gain concentrating in the two repos the previous review had funded is what evidence-led prioritisation looks like when it works. The cost layer is undramatic, which is the point: the run took about 40 minutes, the lead's effort dropped from a day to roughly two hours per cycle, and the keep-or-cut review removed a weekly report variant nobody read.

The review-capacity row is the honest one: the audit workflow was producing findings faster than squads could action them, which is the bottleneck-moves-to-attention problem in miniature. The fix was to slow production to match absorption — less parallelism, longer cadence, capped findings — exactly the counterintuitive move the review-capacity slide argued for. And the backlog finding shows the synthesis boundary working: the correlation between stale component requests and low-coverage squads was flagged as a question for humans to investigate, not asserted as a conclusion. The follow-up conversation, not the report, found the cause.

Narration for this slide

Let's see the operating model working, using the documented case from the design ops reporting workflow: five squads, six repos, a platform-design pair, and a quarterly review. The reporting layer showed token coverage rising from sixty-one to seventy-eight percent over two quarters, and because every figure linked to its counting script, the leads argued about priorities, not about the numbers. The cost layer was boring in the best way: forty-minute runs, a day of manual assembly down to two hours, one unread report variant cut. The interesting finding was review capacity — the audit was producing findings faster than squads could action them, so the team slowed it down rather than speeding it up. And the backlog signal, nine stale component requests from the lowest-coverage squads, was raised as a question. The follow-up conversation found the answer. That is operations doing its job.

Slide 12 of 1316:9

Exercise: draft your one-page operating agreement

One page, written for your actual team and your actual workflows. If it does not fit on a page, it will not be read, and if it is not read, it does not exist.

  • Workflows: list the recurring agent workflows you run, each with an owner and a rough per-cycle budget
  • Surfaces: name the gated surfaces — tokens, shared components, publishing — and who may approve changes to each
  • Review: for each workflow, name the reviewer and the checks that run before human review
  • Reporting: what gets reported, on what cadence, with the no-per-person-metrics line written in
  • Failure and cadence: the first three steps when a change goes wrong, and the date of the first quarterly review

Write it with the people who will live under it, date it, and put the first quarterly review in the calendar before you circulate it.

Slide notes

The exercise asks for a draft, not a policy programme. One page is a forcing function: it keeps the agreement at the level of surfaces, workflows, owners, and cadences, and below the level of legalese nobody will maintain. The five sections mirror the module — workflows and budgets, gated surfaces and permissions, review and checks, reporting boundaries, failure response and cadence — so participants can lift answers directly from the slides and from their own current practice.

Steer people towards honesty over aspiration. The agreement should describe what the team will actually do next month, not an idealised operation: if there is no design ops report today, the reporting line might be a single adoption count per quarter; if review capacity is one senior designer, the workflow list should be short enough for that person to gate properly. An agreement that promises more review than the team can deliver is worse than a modest one, because it trains everyone to ignore the document.

Two details disproportionately predict whether the agreement survives. Writing it with the people who will live under it — the designers running the workflows and at least one engineering counterpart for the gated surfaces — rather than presenting it to them. And scheduling the first quarterly review before circulating it, because the platforms, prices, and tools underneath this module all churn, and the agreement is only trustworthy if its own expiry is built in. If participants did the earlier modules' exercises, the workflow list and the canvas conventions they already drafted slot straight into sections one and two.

Narration for this slide

Your turn. Draft the one-page operating agreement for your team. List the recurring workflows, each with an owner and a rough budget. Name the gated surfaces — tokens, shared components, anything that publishes — and who approves changes to each. For every workflow, name the reviewer and the checks that run first. Write down what gets reported, on what cadence, including the line that says no per-person metrics. And finish with the failure steps and the date of your first quarterly review. Keep it to one page, write it with the people who will live under it, and be honest about what you will actually do — a modest agreement that is followed beats an impressive one that is ignored.

Slide 13 of 1316:9

Summary, and where to go next

  • Cost is managed per workflow: a budget, an owner, and a keep-or-cut review — with prices verified on the day you decide
  • Permissions and audit trails make agent-driven change explainable: scoped surfaces, named approvers, diffs with provenance
  • Review capacity is the real constraint — tier the checks, name the reviewer up front, and reduce parallelism before adding agents
  • Onboarding, governance, and failure review all lean on the same asset: a harness and an operating agreement that are written down and revisited quarterly
  • Orchestration earns its keep only when the operating layer makes it repeatable, affordable, reviewable, and explainable

This closes the course. The patterns came first; the operating layer is what lets a team keep running them after the novelty wears off.

Slide notes

Close the module by tying the operating layer back to the course's opening argument. Module 1 said more agents is a cost, not a feature, and that the coordination tax has to be repaid by the task. Operations is where that accounting becomes visible at team scale: the per-workflow budgets, the review-capacity planning, and the keep-or-cut reviews are the mechanism that keeps the team honest about which orchestrated workflows actually repay their cost — and gives it permission to cut the ones that do not without abandoning the capability.

For where to go next: the school's curriculum continues in three directions. Designers who want to strengthen the foundations under this course should work through the briefing, harness, and critique material in Agentic Design Fundamentals if they have not already — every operating problem in this module gets easier when briefs and harnesses are strong. Those responsible for the production side should go deeper on the design-to-code and design-system courses, which expand the pipeline and audit material from Modules 5 and 6. And for the facts that this module deliberately date-stamped — pricing, plan structures, platform changes — the school's pricing and platform articles are re-verified on a recurring cadence and are the reference to check before any decision; the figures cited here were captured on 1 June 2026 and will drift.

If participants drafted the operating agreement, the closing ask is simple: put the first quarterly review in the calendar now, and bring the signed first design ops report to it. An operating model that has survived one honest review cycle is no longer a course exercise; it is how the team works.

Narration for this slide

Let's close the course. Operating a design agent team comes down to four layers and a cadence. Cost managed per workflow, with budgets, owners, and the honesty to cut runs that are not worth it. Permissions and audit trails that make every agent-driven change explainable. Review capacity treated as the real constraint, because it is. And a harness, an operating agreement, and a reporting cycle that are written down, owned, and revisited quarterly — since the tools and prices underneath all of this keep moving. The patterns from the earlier modules are what make orchestration powerful. The operating layer is what makes it durable. Thanks for taking the course — now go put the first quarterly review in the calendar.

Module transcript
Module 6, narrated slide by slide

Slide 1Operating a Design Agent Team

Welcome to the final module of Orchestrating Design Agent Teams. The previous five modules gave you the patterns: when to split work, how to run fan-outs and pipelines, how to chain tools, how to share a canvas, how to build a pipeline to production. This module is about keeping all of that running once it belongs to a team rather than to you. That means cost you can see and budget, permissions and audit trails for changes agents make, review capacity that scales with the volume agents produce, and a way to onboard people without a six-week ramp. None of it is glamorous. All of it decides whether the capability lasts.

Slide 2Cost: three metering models, one decision

Start with cost, because it is the first thing leadership asks about and the last thing most teams can answer. As of June 2026, the major vendors have converged on a similar ladder: roughly twenty dollars for entry, around a hundred for daily use, around two hundred for all-day use — those numbers will drift, so verify them on the official pages. What matters more is how usage is metered. Subscription windows are predictable but stall on heavy days. Credit metering never stalls but surprises you on the bill. Bring-your-own-key tracks spend exactly, as long as someone owns the tracking. Orchestration multiplies all of this — fan-outs and critique agents are exactly what the bigger tiers exist for. Pick the failure mode you can manage.

Slide 3From one opaque bill to per-workflow budgets

Here is the practical move: budget per workflow, not per subscription. The monthly design system audit, the concept explorations, the visual QA sweeps, the ops report — each one gets a rough budget, an owner, and an honest comparison against the manual work it replaces. That comparison is the keep-or-cut signal. If the audit's findings are still sitting unactioned next cycle, or the report nobody reads costs real money and review time, cut it or redesign it — that is operations working, not the approach failing. And keep a capped pool for exploratory runs, with one rule: anything you repeat twice becomes a named workflow with a budget of its own.

Slide 4Permissions: who and what may change which surfaces

Next, permissions. The question is not whether you trust the agent — it is which surfaces have a high blast radius, and who or what may change them. Token files, shared components, anything that publishes to users: those propagate everywhere, so they default to deny and go through gates. Scratch branches and draft canvases stay open, because that is where speed lives. Scope MCP credentials to the least access that works — read-only is enough more often than you would think. And write the map down, versioned next to your harness. A permission scheme that lives in someone's head is not an operating model; it is a single point of failure.

Slide 5Audit trails: what changed, why, and on whose approval

Audit trails answer one question: where did this change come from? With agent volume, you cannot answer that from memory. So every agent-driven change lands as a diff — a branch, a pull request, a canvas snapshot — never a silent edit. The change references the brief or the workflow that produced it, so a reviewer judges it against intent. The approver is named, because checks can be automated but accountability cannot. And reports follow the same rule: every number links to the script or export that computed it. Months later, when someone asks why the spacing token changed, you have a boring, checkable answer. Boring is the goal.

Slide 6The operating model in one picture

Here is the whole operating model in one picture. Four layers. Harness ownership: the instruction files, skills, and saved workflows, versioned and owned by a named person. Budgets and cost: tracked per workflow, owned by whoever signs the invoice. Review gates: scoped permissions, audit trails, and a named approver on every change. Reporting: agents assemble the evidence, every figure links to its source, and a human signs it. Underneath sits the cadence — weekly spend checks, per-change gates, a signed report each cycle, and a quarterly review of the harness and the policies themselves. Two tests for whether this is real: every layer has a named owner, and every layer has a slot in the cadence. Miss either and it is decoration.

Slide 7Review capacity: the bottleneck moves to human attention

Now the constraint that sneaks up on every team: review. Orchestration multiplies what gets produced, but it does not multiply the people qualified to judge it. So plan review like a resource. Estimate the load when you plan the run — a four-worker fan-out is four reviews and a merge, not one. Tier it: automated checks on everything, critic agents against explicit criteria, and human judgment only on what survives. Name the reviewer before the run starts. And watch for the two failure modes: rubber-stamping when the queue grows, and adding more agents when the real constraint is the reviewer. Unreviewed output is not capacity — it is risk you have postponed.

Slide 8Onboarding without the six-week ramp

Onboarding is where you find out whether you built a system or a pile of personal setups. If the harness carries the team's knowledge — instruction files, skills, saved workflows, conventions — a new designer gets most of it by cloning the repo and reading three things: the operating agreement, the permission map, and one traced run. Start them on low-blast-radius work: audits, drafts, scratch branches. Review their briefs before their output, because briefing quality is the leading indicator. And expand their access surface by surface, based on demonstrated judgment, not tenure. If onboarding still takes six weeks, the problem is not the new person — it is that the knowledge never made it into the harness.

Slide 9Governance that enables vs governance that strangles

Governance gets a bad name because most of it is written to prevent things rather than to enable them safely. The pattern that works: govern surfaces and workflows, not individual runs. Pre-approved workflows run freely; new tools and new surfaces need a decision once. Scoped access with gates beats blanket bans, because bans just create workarounds you cannot see. Per-workflow budgets with a keep-or-cut review beat spending freezes. And one line worth writing into both the policy and the agents themselves: reporting describes team-level distributions, never individuals. The moment the numbers become a surveillance tool, people optimise for the report instead of the work — and the whole system loses the trust it runs on.

Slide 10Failure review: when an agent-driven change goes wrong

Eventually something gets through — a token change that breaks a product surface, a page published before it should have been. What matters is the response. Contain first: revert the diff, roll back the snapshot. That is why everything lands as a reviewable change. Then reconstruct from the audit trail: which brief, which run, who approved it. Then diagnose properly. The agent made a mistake is a description, not a diagnosis — the real question is which layer let it through: the brief, the harness, a missing check, a saturated reviewer. Fix that layer, encode the fix in the harness, and keep the workflow running unless the failure tells you the surface should never have been automated. Both answers are allowed. Choose deliberately.

Slide 11Worked example: an operations review of a running team

Let's see the operating model working, using the documented case from the design ops reporting workflow: five squads, six repos, a platform-design pair, and a quarterly review. The reporting layer showed token coverage rising from sixty-one to seventy-eight percent over two quarters, and because every figure linked to its counting script, the leads argued about priorities, not about the numbers. The cost layer was boring in the best way: forty-minute runs, a day of manual assembly down to two hours, one unread report variant cut. The interesting finding was review capacity — the audit was producing findings faster than squads could action them, so the team slowed it down rather than speeding it up. And the backlog signal, nine stale component requests from the lowest-coverage squads, was raised as a question. The follow-up conversation found the answer. That is operations doing its job.

Slide 12Exercise: draft your one-page operating agreement

Your turn. Draft the one-page operating agreement for your team. List the recurring workflows, each with an owner and a rough budget. Name the gated surfaces — tokens, shared components, anything that publishes — and who approves changes to each. For every workflow, name the reviewer and the checks that run first. Write down what gets reported, on what cadence, including the line that says no per-person metrics. And finish with the failure steps and the date of your first quarterly review. Keep it to one page, write it with the people who will live under it, and be honest about what you will actually do — a modest agreement that is followed beats an impressive one that is ignored.

Slide 13Summary, and where to go next

Let's close the course. Operating a design agent team comes down to four layers and a cadence. Cost managed per workflow, with budgets, owners, and the honesty to cut runs that are not worth it. Permissions and audit trails that make every agent-driven change explainable. Review capacity treated as the real constraint, because it is. And a harness, an operating agreement, and a reporting cycle that are written down, owned, and revisited quarterly — since the tools and prices underneath all of this keep moving. The patterns from the earlier modules are what make orchestration powerful. The operating layer is what makes it durable. Thanks for taking the course — now go put the first quarterly review in the calendar.