AAgentic Design School
Module 3 of 6
45–55 minutes

Orchestrating Design Agent Teams

MCP Chaining Across Design Tools

Single MCP connections are useful; chains are where orchestration earns its keep — research feeding a canvas, the canvas feeding code, the browser feeding screenshots back into review — with attention to credentials, scopes, and what happens when one link fails.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Design a tool chain where each MCP connection has a defined role and scope.
  • Handle credentials, permissions, and data boundaries across chained tools.
  • Build chains that fail loudly and partially rather than silently and completely.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

MCP Chaining Across Design Tools

Orchestrating Design Agent Teams · Module 3 of 6

  • From one connection to a chain: why the chain is the workflow
  • A map of the design-relevant MCP servers and what each one is for
  • Scopes, credentials, and data boundaries decided before the first run
  • Failure handling and observability: chains that break loudly, not silently

A single MCP server removes the copy-paste tax for one tool. A chain lets the agent compare sources of truth — and that is where orchestration starts paying for itself.

Slide notes

Module 2 ended with patterns — fan-out, pipelines, critic pairs — described in terms of agents and briefs. This module supplies the plumbing those patterns run on when the work spans tools: the research board, the design canvas, the repository, and the browser, all reachable from one agent session over MCP. The framing to establish up front is that a chain is not a fancier connection; it is a workflow design decision. Which tool is the source of truth for which fact, what may be written where, and what happens when a link is down are design questions, not configuration trivia.

Set expectations on prerequisites. The module assumes participants already know what MCP is and have connected at least one server — that material is covered in the school's MCP for Designers article and in the foundations course. What it does not assume is any experience running multiple servers in one session, which is genuinely different in its failure modes and its cost profile.

Also flag the honesty posture for the module: the worked example later draws on this school's own sync experiments, including a remote write path that is currently blocked at the authentication boundary. The point of including the failure is that it is the most common real outcome, and designing for it is most of the skill.

Narration for this slide

Welcome to Module 3. So far this course has talked about agents and patterns. This module is about the plumbing that lets those patterns reach your actual tools: MCP, and specifically what happens when one agent session has several MCP servers active at once — a research board, a design canvas, the repository, the browser. A single connection is convenient. A chain is something more: it lets the agent compare sources of truth and move work between systems without you ferrying screenshots and hex codes by hand. It also fails in new ways, costs more context, and raises real questions about credentials and data boundaries. That is what we cover here.

Slide 2 of 1316:9

Why chains, not just connections

One server removes the copy-paste tax for one tool. Two or more servers in the same session let the agent compare and move — which is what orchestrated workflows actually need.

  • Compare sources of truth: design variables against the repo's tokens, the canvas against the live page
  • Move content between systems: research board to canvas, canvas to code, code to screenshots
  • One context window: every tool result lands where the agent is already reasoning
  • The cost: more credentials, more failure points, and more context spent before reasoning starts

Chain when the workflow needs a comparison or a handover between systems. If one tool holds everything the task needs, one connection is enough.

Slide notes

Anchor the distinction in jobs rather than architecture. A single design-server connection answers questions about one tool: what variables does this frame use, what does this board say. The workflows that justify this module need two sources at once: token sync compares design-tool variables against the repository's CSS custom properties; content sync reads real research from a board and pushes it into mockups; an implementation review reads intent from the design server and reality from the browser server and reports the difference. None of those is possible with one connection, and all of them collapse back into manual copy-paste without MCP.

The mechanism is mundane and worth saying plainly: multiple MCP servers can be active in the same agent session, the agent picks tools across all of them, and every result lands in the same context window. There is no special chaining feature — the chain is just the order of tool calls the agent makes, steered by your prompt and harness.

Name the costs in the same breath, because the rest of the module is about managing them. Each connected server adds tool definitions to every request, each remote server adds a credential to manage, and screenshot-heavy chains burn context quickly. The honest default is to connect what the task needs, not everything the project owns.

Narration for this slide

Why chain at all? Because the workflows that matter need two sources of truth at once. Token sync compares the design tool's variables against the tokens in your repo. Content sync pulls real research off a board and into your mockups. An implementation review reads intent from the canvas and reality from the browser, and reports the difference. None of that works with a single connection. The mechanism is simple: several MCP servers active in one session, and the agent calls tools across all of them, with every result landing in the same context window. The chain is just the order of those calls. The cost is real too — more credentials, more failure points, more context burned — and managing that cost is most of this module.

Slide 3 of 1316:9

The design-relevant servers, mapped to roles

Every server in a chain should have one defined role. As of June 2026, the useful ones split into canvases, boards, browsers, and the systems around them.

Role in the chainServersReadsWrites
Design canvasFigma (remote, beta), Pencil, PaperVariables, structure, screenshotsFrames, components — keep behind a prompt
Research and contentMiro, NotionBoards, stickies, real contentDiagrams, docs — rarely needed in a chain
Browser and evidencePlaywright, Chrome DevToolsScreenshots, console, networkNothing — evidence only
Tickets and reviewGitHub, Jira, Linear serversIssues, PRs, commentsStatus updates, review comments
The repositoryFile tools — no MCP neededTokens, components, DESIGN.mdCode and token changes, on a branch

The repository is usually the strongest link in the chain precisely because it needs no server: every write is a diff someone can review and revert.

Slide notes

The table groups servers by the role they play in a chain rather than by vendor, because that is the decision a chain designer actually makes: where does intent come from, where does content come from, where does evidence come from, and where does the artifact live. Most real chains use one server per role, not every server you have.

A few date-stamped specifics worth voicing. Figma's official MCP server is in beta as of June 2026, comes in remote and desktop forms, reads design context, variables and screenshots, and can also write to the canvas — rate limits vary by seat type, so check before building a daily ritual on it. Pencil and Paper are the connected-canvas pair: local servers, no auth, and the canvas file itself is diffable. Playwright and Chrome DevTools are official Microsoft and Google servers respectively, and they are the half of the chain designers most often forget — without them the chain produces work but no evidence. Ticketing servers matter mostly for orchestrated team workflows, where the chain's output needs to land where the team already tracks work.

The last row is the quiet point: the repository participates in every chain through ordinary file tools, no MCP required, and it is where writes are safest because they arrive as reviewable, revertible diffs. When in doubt about where a chain should write, the answer is usually the repo.

Narration for this slide

Here is the landscape, organised by role rather than logo. Design canvases — Figma's official server, Pencil, Paper — hold the intent: variables, structure, screenshots, and increasingly the ability to write back. Research tools like Miro and Notion hold the content and the findings. Browser servers — Playwright, Chrome DevTools — hold the evidence: screenshots, console errors, what the page actually does. Ticketing servers connect the chain to where your team tracks work. And the repository is in every chain without needing a server at all, which is exactly why it is the safest place to write: everything lands as a diff you can review and revert. Pick one server per role. You almost never need them all.

Slide 4 of 1316:9

Chain design: which tool feeds which step

Design the chain on paper before connecting anything. Each link gets a role, a direction, and a single source of truth for the facts it carries.

  • Name the steps of the workflow first, then assign one server to each step
  • Mark each link as read or write — most links should be read-only
  • Declare the source of truth per fact: tokens, content, layout, behaviour
  • Decide where the human gates sit — usually between the last read and the first write
  • Write the chain into the harness so every session runs it the same way

A chain is a workflow with tools attached, not a pile of connections. If you cannot say which server owns which fact, the chain is not designed yet.

Slide notes

The discipline here mirrors the worker-prompt discipline from Module 2: define the boundary before the work starts. For a chain, the boundary questions are which server feeds which step, in which direction, and which system is the source of truth for each kind of fact. Tokens might be owned by the repository with the canvas as a renderer of them, or owned by Figma variables with the repo following — both are workable, but a chain where both sides think they own the tokens will quietly overwrite one of them.

The read/write marking matters because it determines the permission posture and the blast radius. Most links in a healthy chain are reads: read the board, read the variables, read the screenshot. Writes should be few, late in the chain, and ideally aimed at reviewable surfaces — a branch in the repo, a clearly named page in the canvas — rather than at shared live boards.

The last bullet connects this module back to the harness material from the foundations course. A chain that lives only in one person's prompt history is a demo. A chain documented in the project's instruction file — which servers, which order, which tools are pre-approved, where the gates are — is a workflow the whole team and every future session inherits.

Narration for this slide

Before you connect anything, design the chain on paper. List the steps of the workflow, then assign one server to each step. Mark every link as read or write — and notice that most links should be reads. Decide which system is the source of truth for each kind of fact: who owns the tokens, who owns the content, who owns layout. That single decision prevents the worst chain failure, where two systems both think they are authoritative and the agent happily overwrites one with the other. Then place your human gates, usually between the last read and the first write. And finally, write the chain down in the harness, so it runs the same way in every session — not just the one where you remembered the magic prompt.

Slide 5 of 1316:9

One session, several servers

What the agent reads from each link, what it writes where, and the four places chains usually break.

Diagram of one agent session connected to four links: a research board (Miro or Notion) the agent reads stickies and content from and writes nothing to; a design canvas (Figma, Pencil or Paper) it reads variables, structure and screenshots from and writes frames to behind a prompt; the repository where it reads tokens and DESIGN.md and writes code on a branch; and a browser server (Playwright or DevTools) used read-only for screenshots and console output as review evidence. A band along the bottom lists the typical break points: OAuth and credential rejection on remote servers, local apps or dev servers not running, stale connections in long sessions, and context burn from screenshots and tool definitions.
Every link has a defined read and write surface, and the writes that matter land in reviewable places — a branch, a named canvas page. The bottom band is where chains actually fail: credentials, apps that are not running, stale connections, and context burn.

The chain looks like architecture but behaves like a workflow: the reads gather truth, the writes are few and reviewable, and the break points are mostly boring operational facts.

Slide notes

Walk the diagram left to right as a typical chain run. The research board is read-only by policy: it feeds the brief and supplies real content, and there is rarely a good reason for an automated chain to write back to a shared live board. The design canvas is the intent source — variables, structure, reference screenshots — and increasingly a write target too, but writes there belong behind a permission prompt and aimed at a clearly named page. The repository participates through ordinary file tools and is where the artifact actually lives; its writes are branches and diffs, which is what makes them safe. The browser closes the loop: screenshots and console output as evidence for review, never writes.

Then spend equal time on the bottom band, because it is the part people skip until it bites. Remote servers fail at the credential boundary — OAuth registration rejected, enterprise policy blocking the connector — often before a single tool is listed. Local servers fail because the app behind them is not running: the design editor, the desktop app, the dev server. Long sessions go stale and tool calls start hanging. And context burn is the silent one: tool definitions for every connected server ride along with every request, screenshots are large, and a chain can spend its reasoning budget on plumbing.

The design lesson is that none of these is exotic. They are operational facts you can plan for: check connections at the start of a run, keep sessions lean, and have a fallback for the link most likely to be blocked.

Narration for this slide

Here is the whole module in one picture. One agent session, four links. The research board feeds content and findings — read-only. The design canvas holds the intent — the agent reads variables, structure, and screenshots, and if it writes, it writes behind a prompt to a clearly named page. The repository is where the artifact lives, and its writes are branches and diffs, which is why it is the safest place to write. The browser closes the loop with screenshots and console output — evidence, never edits. And along the bottom, the four places chains actually break: credentials rejected on remote servers, local apps that simply are not running, connections that go stale in long sessions, and context burned on screenshots before the reasoning starts. None of that is exotic. All of it is plannable.

Slide 6 of 1316:9

Scopes and credentials: least access that still works

Every server in the chain is a dependency with access to your tools. Scope it like one — per project, per role, with writes left on ask.

  • Connect servers per project, not globally — a client project sees only its own servers
  • Authenticate with accounts scoped to this project's files, not the whole workspace
  • Pre-approve only the read tools the chain actually uses; leave every write tool on ask
.claude/settings.json — a checked-in permission posture for one chain (excerpt)
{
  "enabledMcpjsonServers": ["pencil", "playwright"],
  "permissions": {
    "allow": [
      "mcp__pencil__get_variables",
      "mcp__pencil__get_screenshot",
      "mcp__playwright__browser_navigate",
      "mcp__playwright__browser_take_screenshot"
    ],
    "ask": [
      "mcp__pencil__batch_design",
      "mcp__pencil__set_variables",
      "mcp__figma__*"
    ]
  }
}

The friction of clicking approve on a write is the feature, not the bug. It is the cheapest gate in the entire chain.

Slide notes

The principle is least access that still works, applied at three layers. The first is which servers are connected at all: project scope rather than global, in a checked-in file the team can review, so a client project never silently inherits servers from another. The second is whose credentials the OAuth grant or token belongs to: an account with access to this project's files, not an admin account with the whole organisation's workspace behind it. The third is which tools are pre-approved versus prompted: read tools can be allowed once a server is trusted, write tools stay on ask.

The code excerpt shows the Claude Code shape — a checked-in settings file naming which servers are enabled and which specific tools are allowed without prompting — but the same posture exists on the other platforms: Gemini CLI has an allowlist for which servers may connect, OpenCode keeps enabled flags in a reviewable JSON file, and Codex CLI's config.toml should be reviewed like any other dotfile change. The portable idea is that the permission posture is itself a reviewable artifact, not a per-person preference.

There is also a prompt-injection angle worth one sentence: everything a server returns — sticky-note text, layer names, page content — enters the agent's context as untrusted input, and the reason writes stay behind prompts is so an instruction smuggled in through a board cannot quietly become a canvas edit or a file write.

Narration for this slide

Every server in your chain is a dependency with access to real tools, so scope it the way you would scope any dependency. Connect servers per project, in a checked-in file, so each project only sees what it needs. Authenticate with an account that can reach this project's files — not the whole workspace. And split the tools: pre-approve the read tools the chain actually uses, and leave every write tool on ask. That little approve click before a canvas edit or a file write feels like friction, but it is the cheapest gate in the whole chain — and it is also your defence against content from a board or a file smuggling instructions into the agent. The posture in this snippet is itself reviewable, which is the point: it lives in the repo, not in someone's head.

Slide 7 of 1316:9

Data boundaries: what may leave which system

A chain moves data between systems by design. Decide which moves are allowed before the agent makes them for you.

  • Map what each server can see: research data, unreleased designs, client names, production content
  • Remote servers mean data leaves your machine — know which links are remote and what crosses them
  • Research data often carries consent limits: participant quotes do not belong on every canvas
  • Client boundaries: one project's chain must not be able to read another client's files
  • Write the allowed moves into the harness — "content from the board may go to the canvas, not to tickets"

The chain will move whatever it can reach. Data boundaries are the part of chain design where you decide what it is allowed to reach.

Slide notes

This is the slide where the orchestration view and the governance view meet. A chain is, by definition, a machine for moving information between systems — that is its value, and it is also the risk. The questions to settle per chain: what can each connected server see, which links are remote (meaning data leaves your machine and transits a vendor's infrastructure), and which moves between systems are actually permitted by your client agreements, your research consent, and your own policies.

Give concrete examples rather than abstract policy. Research boards often contain participant quotes and personally identifying detail collected under a consent form that did not mention pushing them into a Figma file shared with a vendor. Unreleased product designs flowing out through a remote server may be fine for an internal project and a contractual problem for a client engagement. And in agency or multi-client settings, the strongest boundary is structural: per-project server config and per-project credentials mean the chain physically cannot reach the other client's files, which beats relying on the agent to remember a rule.

The practical move is to write the allowed flows into the harness in plain language — what may leave the research board, where canvas content may go, what must never appear in tickets — so the boundary survives staff changes and session restarts. Keep the claims conservative: this is a fast-moving area, and as of June 2026 most teams are still working these policies out as they go.

Narration for this slide

A chain exists to move data between systems, so somebody has to decide which moves are allowed — and it should be you, before the run, not the agent, during it. Map what each server can see. Know which links are remote, because remote means the data leaves your machine. Be careful with research: participant quotes were collected under consent that probably did not mention pushing them onto a shared canvas. And in client work, make the boundary structural — per-project servers and per-project credentials, so the chain simply cannot reach the wrong client's files. Then write the allowed flows into the harness in plain language. The chain will move whatever it can reach; data boundaries are how you decide what it can reach.

Slide 8 of 1316:9

Failure handling: loud and partial beats silent and complete

Decide per link what should happen when it fails. The goal is a chain that stops at the broken link, keeps what it gathered, and tells a human.

FailureWhat you want the chain to do
Remote server rejects credentialsStop that link, report the exact error, fall back to the local path if one exists
Local app or dev server not runningStop and ask — the fix is human: open the app, start the server, retry
Connection goes stale mid-sessionRetry once, then surface it; do not let the agent improvise around a dead link
A read returns partial or empty resultsCarry on with what is real, label the gap explicitly in the output
A write fails halfwayStop the chain; never continue writes downstream of an unconfirmed write

Partial results with a clear label are useful. A complete-looking result built on a silently failed link is the most expensive output a chain can produce.

Slide notes

The learning objective behind this slide is the phrase fail loudly and partially rather than silently and completely. The worst chain outcome is not a crash — it is a polished-looking deliverable assembled around a link that quietly failed: a token sync report that never actually read the design variables, a review that compared the page against a stale screenshot, a canvas update that wrote half its frames and stopped. Agents are good at producing plausible output, and plausible output over a hole is exactly what slips through review.

Walk the table by failure class. Credential failures on remote servers are the most common first-run failure and usually need a human or an admin; the useful behaviours are reporting the exact error and falling back to a local path where one exists — a plugin, a file import, a CLI. Apps not running are trivially fixable but only by a human, so the chain should stop and say so rather than guess. Stale connections deserve one retry, not an improvised workaround. Partial reads are fine if the gap is labelled. And failed writes are the hard rule: nothing downstream of an unconfirmed write should run, because cleaning up a half-applied change across two systems is far worse than rerunning a chain.

Most of this is implemented in prompts and harness instructions rather than code: tell the agent explicitly to verify each connection before relying on it, to report failures verbatim, and to stop rather than improvise when a write does not confirm.

Narration for this slide

Now the part that separates a demo chain from one you can rely on: what happens when a link fails. The principle is simple — fail loudly and partially, never silently and completely. The most expensive thing a chain can produce is a polished-looking result built around a link that quietly broke: a token report that never actually read the variables, a review against a stale screenshot. So decide the behaviour per failure class. Credentials rejected: report the exact error and fall back to the local path if there is one. App not running: stop and ask, because the fix is human. Stale connection: retry once, then surface it. Partial reads: carry on, but label the gap. And a failed write stops the chain, full stop — nothing runs downstream of a write you cannot confirm.

Slide 9 of 1316:9

Observability: knowing what the chain actually did

If you cannot reconstruct what the chain read, wrote, and skipped, you cannot review its output — you can only trust it.

  • Keep the tool-call trail: which servers, which tools, which files and frames, in what order
  • Have the chain write a dated status record: what succeeded, what failed, what was skipped
  • Make writes land on reviewable surfaces: branches, named canvas pages, draft tickets
  • Watch context spend: screenshots and tool definitions are the silent budget of every chain
  • Review the trail, not just the deliverable — that is where silent failures show up

A dated status file that says one link is still blocked is worth more than a beautiful deliverable you cannot account for.

Slide notes

Observability sounds like an engineering luxury until the first time someone asks what changed on the shared canvas last Tuesday and nobody can answer. For design chains the bar is modest and reachable: you should be able to reconstruct, after the fact, which servers the session used, what it read, what it wrote and where, and which steps failed or were skipped. Most agent platforms show the tool-call trail in the session; the discipline is keeping it — in the run log, the pull request description, or a status file — rather than letting it evaporate when the terminal closes.

The dated status record is a pattern worth stealing from this school's own sync pipeline: every run writes a small JSON file recording what was rebuilt, what was pushed, and which path failed with which error. When the remote write path was blocked at OAuth, the status file said so honestly, and the next attempt started from evidence instead of memory. The same idea scales down to a paragraph at the end of a chain prompt: list what you read, what you wrote, and what you could not do.

The context-spend bullet is observability of a different kind: a chain that is degrading because its context is full of screenshots does not announce itself. Keeping review sessions separate from build sessions, requesting specific frames rather than whole pages, and disconnecting unused servers are the boring fixes. This connects forward to Module 6, where cost and audit trails get treated as team-level operations rather than personal habits.

Narration for this slide

Observability for a design chain is not dashboards — it is being able to answer, afterwards, what did this chain read, what did it write, and what did it skip. Keep the tool-call trail instead of letting it vanish with the terminal. Have the chain write a small dated status record: what succeeded, what failed, what was skipped — our own sync pipeline does exactly this, and the day the remote write path was blocked, that status file was the difference between evidence and guesswork. Aim writes at reviewable surfaces: branches, named canvas pages, drafts. And watch the context spend, because a chain drowning in screenshots degrades quietly. Then review the trail, not just the deliverable — the trail is where silent failures show up.

Slide 10 of 1316:9

Worked example: research to canvas to code to review

One chain, four links, traced as a sequence of reads, writes, and gates. The shape is what transfers; the servers are interchangeable.

StepLinkReads / writesGate
1. GatherResearch board (remote, read-only)Reads findings and real content for the briefHuman confirms scope before any write
2. Read intentDesign canvas serverReads variables, structure, a reference screenshot
3. BuildRepository (file tools)Writes components and token updates on a branchDiff reviewed before merge
4. VerifyBrowser serverReads screenshots at two breakpoints, console outputFindings reported P0–P3, human decides fixes
5. RecordStatus file in the repoWrites what ran, what failed, what was skippedTrail reviewed alongside the deliverable

Eight to twelve tool calls, one branch, one evidence pack, one status record — and the first attempt rarely survives without at least one connection retry.

Slide notes

Walk the chain as a story rather than a feature list. The task shape is one this course keeps returning to: take a researched piece of content or a design decision, get it built in the product's real materials, and verify the result against the design source. Step one reads the research board for the findings and the real content — read-only, and gated by a human confirming scope, because this is the point where the brief is still cheap to change. Step two reads the design canvas: variables, structure, and a reference screenshot, which together form the design contract. Step three is the build, and it deliberately writes to the repository on a branch rather than to any live surface. Step four turns the browser server on the result: screenshots at a desktop and a mobile width plus console output, compared against the contract from step two and reported as P0 to P3 findings. Step five writes the status record.

Be honest about effort and failure rates, drawing on the school's MCP article: assembled from documented tool surfaces, a chain like this is roughly eight to twelve tool calls plus the comparison and write-up, and in practice the first attempt rarely survives without one connection or wrong-node retry — typically a design editor or dev server that was not running. That is a few minutes of agent time for a review a designer would spend half an hour doing less systematically by tab-switching.

Point out what makes the chain safe rather than just fast: every write lands on a reviewable surface, the only irreversible-looking step (the canvas) is read-only here, and the gates sit exactly where the failure handling slide said they should — before the first write and after the evidence comes back.

Narration for this slide

Let's trace one chain end to end. The job: take researched content, build it in the product's real materials, and verify the result against the design source. Step one, read the research board — read-only — and have a human confirm scope while changing course is still cheap. Step two, read the design canvas: variables, structure, a reference screenshot. That is the design contract. Step three, build — writing to the repository, on a branch, never to a live surface. Step four, point the browser at the result: screenshots at two breakpoints, console output, compared against the contract and reported as P0 to P3 findings. Step five, write the status record. Eight to twelve tool calls, a few minutes of agent time — and expect the first attempt to need one retry, usually because something was not running.

Slide 11 of 1316:9

Field evidence: where our own chain broke

This school runs a chain like this against its own design system. Two findings from those runs travel well, dated June 2026.

  • Code-to-canvas sync to OpenPencil works reliably — but only one way; the agent-writes-canvas direction still depends on a desktop MCP bridge that does not always connect
  • The remote Figma write path was blocked at OAuth client registration (HTTP 403) before a single tool was listed — the failure was at the credential boundary, not in the payload
  • Both failures were survivable because fallbacks existed: a file-backed CLI build for the canvas, a local plugin for Figma
  • Every run writes a dated status file, so the next attempt starts from evidence instead of memory

Plan for the authentication boundary to be the weakest link, and keep a boring local fallback — a plugin, a file import, a CLI — for the link most likely to be blocked.

Slide notes

These findings come from the school's two published field notes on syncing its design system outward: one to OpenPencil, one to Figma over MCP. The OpenPencil note shows the happy half: a watch loop rebuilds a file-backed canvas — token boards, component boards, key screens — from the same files the site is built from, so the canvas became a build output rather than an artifact anyone maintains by hand. The honest caveat in that note is that the round trip is not closed: the HTTP MCP bridge to the desktop app was not reliably connected, so the chain runs one way, code to canvas.

The Figma note is the more instructive failure. The same generated payload was pointed at Figma's remote MCP server, and the standalone client was rejected during OAuth client registration with an HTTP 403 before any tool was listed. Nothing in the payload or the script was at fault — the failure sat exactly at the authentication boundary, which is where most agent-writes-to-vendor-SaaS workflows currently break. The chain still shipped because two fallbacks existed: the Figma MCP tools registered through another agent platform authenticated fine for inspection and targeted writes, and a local development plugin handled full canvas refreshes without any remote endpoint.

The transferable lessons are the two on the slide: treat the credential boundary as the link most likely to fail, and keep a boring local fallback for it; and record every failure in a dated status file. Both are cheap, and both are exactly what the failure-handling and observability slides asked for — this is what they look like in practice.

Narration for this slide

This is not theoretical — we run a chain like this against this school's own design system, and it broke in instructive places. Syncing code to an OpenPencil canvas works reliably, but only one way: the direction where the agent writes back to the canvas still depends on a desktop bridge that does not always connect. And pushing the same design system to Figma over the remote MCP server failed at OAuth registration — a 403 before a single tool was even listed. The payload was fine; the credential boundary was the problem, and that is where these workflows most often break. Both runs still shipped, because fallbacks existed — a file-backed CLI build, a local plugin — and because every run writes a dated status file, so the next attempt started from evidence, not memory.

Slide 12 of 1316:9

Exercise: map the chain for one of your workflows

Take one workflow you already run — a token sync, an implementation review, a content update — and design its chain on one page before connecting anything.

  • List the steps, then assign one server (or the repo) to each step, marked read or write
  • Name the source of truth for each kind of fact: tokens, content, layout, behaviour
  • Write the permission posture: which read tools are pre-approved, which writes stay on ask
  • Mark the data boundaries: what may leave which system, and which links are remote
  • For each link, write one line of failure handling — and name the fallback for the link most likely to break

If the page shows more than one or two write links, or no fallback for the remote links, the chain is not ready to run yet.

Slide notes

The exercise is deliberately a paper exercise, like the one in Module 2: the design of the chain is the skill, and connecting servers is the easy part once the design exists. Steer participants towards a workflow they already do manually — comparing design variables to the repo's tokens, reviewing a built page against its design source, refreshing mockups with real content — rather than inventing a new workflow to justify the tooling.

The checks built into the bullets mirror the module: roles and direction per link, a single source of truth per fact, a permission posture that could be checked into the repo, explicit data boundaries, and one line of failure handling per link including the fallback for whichever link is most likely to be blocked — usually the remote one with OAuth in front of it.

If running this live, have people swap pages and review each other's chains against two questions: where would this chain fail silently, and what would you not be able to reconstruct afterwards? Those two questions catch most of the weak designs, and they are the same questions Module 6 will ask at the level of the whole team's operations. Keep the page; the concurrent-canvas module next builds on the same workflow.

Narration for this slide

Your turn. Pick one workflow you already run by hand — a token sync, an implementation review, refreshing mockups with real content — and design its chain on a single page. List the steps and assign one server, or the repo, to each, marked read or write. Name the source of truth for every kind of fact. Write the permission posture you would actually check in. Mark what may leave which system. And for every link, one line of failure handling — including the fallback for the link most likely to be blocked, which is usually the remote one. Then apply the test: more than a couple of write links, or no fallback for the remote ones, and the chain is not ready to run yet. Keep the page — you will reuse this workflow in the next module.

Slide 13 of 1316:9

Summary, and the shared canvas ahead

  • Chain when the workflow needs a comparison or a handover between systems; one server per role, not every server you own
  • Design the chain on paper: roles, read/write direction, source of truth per fact, gates before writes
  • Scope credentials per project, pre-approve only reads, and keep every write behind a prompt
  • Build chains that fail loudly and partially — fallbacks for the credential boundary, stops on unconfirmed writes
  • Keep the trail: tool calls, dated status records, and writes that land on reviewable surfaces

Module 4 puts multiple agents on one shared canvas at the same time — zone ownership, conventions, and the review rhythm that keeps a fast-moving file coherent.

Slide notes

Recap by tying the slides back to the three learning objectives. Designing a chain where each connection has a defined role and scope was the server map, the chain-design slide, and the diagram. Handling credentials, permissions, and data boundaries was the least-access posture and the data-boundaries slide. Building chains that fail loudly and partially was the failure-handling table, the observability slide, and the field evidence — the OAuth 403 and the one-way canvas sync are what those principles look like when they meet a real vendor boundary.

Remind people of the date-stamps. The server landscape, Figma's beta status and rate limits, and the exact configuration shapes were verified against documentation in mid-2026 and will drift; the chain-design discipline, the permission posture, and the failure-handling rules are the parts that will still be true when the tool names change.

Then set up Module 4. This module had one agent operating several tools. The next one inverts it: several agents operating one shared surface — a canvas or design file — at the same time, which raises a different set of problems: zone ownership, naming and component conventions agents can actually follow, conflict cases around shared tokens and components, and the review rhythm that catches divergence before it spreads. The chain map from the exercise carries forward, because a shared canvas is usually one link in a larger chain.

Narration for this slide

Let's close. Chain when the workflow needs a comparison or a handover between systems, and use one server per role — not everything you own. Design the chain on paper first: roles, direction, a single source of truth per fact, and gates before the writes. Scope the credentials per project, pre-approve only the reads, keep writes behind a prompt. Build for failure: loud and partial, with a fallback for the credential boundary, and a hard stop on any unconfirmed write. And keep the trail — tool calls, a dated status record, writes on reviewable surfaces. In Module 4 we flip the picture: instead of one agent across many tools, several agents on one shared canvas at the same time — and the conventions and review rhythm that stop that from becoming chaos. See you there.

Module transcript
Module 3, narrated slide by slide

Slide 1MCP Chaining Across Design Tools

Welcome to Module 3. So far this course has talked about agents and patterns. This module is about the plumbing that lets those patterns reach your actual tools: MCP, and specifically what happens when one agent session has several MCP servers active at once — a research board, a design canvas, the repository, the browser. A single connection is convenient. A chain is something more: it lets the agent compare sources of truth and move work between systems without you ferrying screenshots and hex codes by hand. It also fails in new ways, costs more context, and raises real questions about credentials and data boundaries. That is what we cover here.

Slide 2Why chains, not just connections

Why chain at all? Because the workflows that matter need two sources of truth at once. Token sync compares the design tool's variables against the tokens in your repo. Content sync pulls real research off a board and into your mockups. An implementation review reads intent from the canvas and reality from the browser, and reports the difference. None of that works with a single connection. The mechanism is simple: several MCP servers active in one session, and the agent calls tools across all of them, with every result landing in the same context window. The chain is just the order of those calls. The cost is real too — more credentials, more failure points, more context burned — and managing that cost is most of this module.

Slide 3The design-relevant servers, mapped to roles

Here is the landscape, organised by role rather than logo. Design canvases — Figma's official server, Pencil, Paper — hold the intent: variables, structure, screenshots, and increasingly the ability to write back. Research tools like Miro and Notion hold the content and the findings. Browser servers — Playwright, Chrome DevTools — hold the evidence: screenshots, console errors, what the page actually does. Ticketing servers connect the chain to where your team tracks work. And the repository is in every chain without needing a server at all, which is exactly why it is the safest place to write: everything lands as a diff you can review and revert. Pick one server per role. You almost never need them all.

Slide 4Chain design: which tool feeds which step

Before you connect anything, design the chain on paper. List the steps of the workflow, then assign one server to each step. Mark every link as read or write — and notice that most links should be reads. Decide which system is the source of truth for each kind of fact: who owns the tokens, who owns the content, who owns layout. That single decision prevents the worst chain failure, where two systems both think they are authoritative and the agent happily overwrites one with the other. Then place your human gates, usually between the last read and the first write. And finally, write the chain down in the harness, so it runs the same way in every session — not just the one where you remembered the magic prompt.

Slide 5One session, several servers

Here is the whole module in one picture. One agent session, four links. The research board feeds content and findings — read-only. The design canvas holds the intent — the agent reads variables, structure, and screenshots, and if it writes, it writes behind a prompt to a clearly named page. The repository is where the artifact lives, and its writes are branches and diffs, which is why it is the safest place to write. The browser closes the loop with screenshots and console output — evidence, never edits. And along the bottom, the four places chains actually break: credentials rejected on remote servers, local apps that simply are not running, connections that go stale in long sessions, and context burned on screenshots before the reasoning starts. None of that is exotic. All of it is plannable.

Slide 6Scopes and credentials: least access that still works

Every server in your chain is a dependency with access to real tools, so scope it the way you would scope any dependency. Connect servers per project, in a checked-in file, so each project only sees what it needs. Authenticate with an account that can reach this project's files — not the whole workspace. And split the tools: pre-approve the read tools the chain actually uses, and leave every write tool on ask. That little approve click before a canvas edit or a file write feels like friction, but it is the cheapest gate in the whole chain — and it is also your defence against content from a board or a file smuggling instructions into the agent. The posture in this snippet is itself reviewable, which is the point: it lives in the repo, not in someone's head.

Slide 7Data boundaries: what may leave which system

A chain exists to move data between systems, so somebody has to decide which moves are allowed — and it should be you, before the run, not the agent, during it. Map what each server can see. Know which links are remote, because remote means the data leaves your machine. Be careful with research: participant quotes were collected under consent that probably did not mention pushing them onto a shared canvas. And in client work, make the boundary structural — per-project servers and per-project credentials, so the chain simply cannot reach the wrong client's files. Then write the allowed flows into the harness in plain language. The chain will move whatever it can reach; data boundaries are how you decide what it can reach.

Slide 8Failure handling: loud and partial beats silent and complete

Now the part that separates a demo chain from one you can rely on: what happens when a link fails. The principle is simple — fail loudly and partially, never silently and completely. The most expensive thing a chain can produce is a polished-looking result built around a link that quietly broke: a token report that never actually read the variables, a review against a stale screenshot. So decide the behaviour per failure class. Credentials rejected: report the exact error and fall back to the local path if there is one. App not running: stop and ask, because the fix is human. Stale connection: retry once, then surface it. Partial reads: carry on, but label the gap. And a failed write stops the chain, full stop — nothing runs downstream of a write you cannot confirm.

Slide 9Observability: knowing what the chain actually did

Observability for a design chain is not dashboards — it is being able to answer, afterwards, what did this chain read, what did it write, and what did it skip. Keep the tool-call trail instead of letting it vanish with the terminal. Have the chain write a small dated status record: what succeeded, what failed, what was skipped — our own sync pipeline does exactly this, and the day the remote write path was blocked, that status file was the difference between evidence and guesswork. Aim writes at reviewable surfaces: branches, named canvas pages, drafts. And watch the context spend, because a chain drowning in screenshots degrades quietly. Then review the trail, not just the deliverable — the trail is where silent failures show up.

Slide 10Worked example: research to canvas to code to review

Let's trace one chain end to end. The job: take researched content, build it in the product's real materials, and verify the result against the design source. Step one, read the research board — read-only — and have a human confirm scope while changing course is still cheap. Step two, read the design canvas: variables, structure, a reference screenshot. That is the design contract. Step three, build — writing to the repository, on a branch, never to a live surface. Step four, point the browser at the result: screenshots at two breakpoints, console output, compared against the contract and reported as P0 to P3 findings. Step five, write the status record. Eight to twelve tool calls, a few minutes of agent time — and expect the first attempt to need one retry, usually because something was not running.

Slide 11Field evidence: where our own chain broke

This is not theoretical — we run a chain like this against this school's own design system, and it broke in instructive places. Syncing code to an OpenPencil canvas works reliably, but only one way: the direction where the agent writes back to the canvas still depends on a desktop bridge that does not always connect. And pushing the same design system to Figma over the remote MCP server failed at OAuth registration — a 403 before a single tool was even listed. The payload was fine; the credential boundary was the problem, and that is where these workflows most often break. Both runs still shipped, because fallbacks existed — a file-backed CLI build, a local plugin — and because every run writes a dated status file, so the next attempt started from evidence, not memory.

Slide 12Exercise: map the chain for one of your workflows

Your turn. Pick one workflow you already run by hand — a token sync, an implementation review, refreshing mockups with real content — and design its chain on a single page. List the steps and assign one server, or the repo, to each, marked read or write. Name the source of truth for every kind of fact. Write the permission posture you would actually check in. Mark what may leave which system. And for every link, one line of failure handling — including the fallback for the link most likely to be blocked, which is usually the remote one. Then apply the test: more than a couple of write links, or no fallback for the remote ones, and the chain is not ready to run yet. Keep the page — you will reuse this workflow in the next module.

Slide 13Summary, and the shared canvas ahead

Let's close. Chain when the workflow needs a comparison or a handover between systems, and use one server per role — not everything you own. Design the chain on paper first: roles, direction, a single source of truth per fact, and gates before the writes. Scope the credentials per project, pre-approve only the reads, keep writes behind a prompt. Build for failure: loud and partial, with a fallback for the credential boundary, and a hard stop on any unconfirmed write. And keep the trail — tool calls, a dated status record, writes on reviewable surfaces. In Module 4 we flip the picture: instead of one agent across many tools, several agents on one shared canvas at the same time — and the conventions and review rhythm that stop that from becoming chaos. See you there.