AAgentic Design School
Module 5 of 6
40–50 minutes

Agentic Design Fundamentals

Design-as-Code and the Tool Landscape

Why diffable design files are the foundation the rest of the workflow stands on, what the current agent platforms and connected canvases actually offer, and how to pick a starting stack without betting the team on a moving target.

Duration40–50 minutes

Slides13 slides with notes and narration

Learning objectives

  • Explain why binary design files block agent workflows and what diffable alternatives exist.
  • Compare the four major CLI agent platforms on the dimensions that matter for design work.
  • Describe the connected-canvas landscape and where each tool fits.
  • Choose a starting stack for a team and defend the choice.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Design-as-Code and the Tool Landscape

Agentic Design Fundamentals · Module 5 of 6

  • The artifact decides what the agent can do
  • Diffable formats: .pen, .op, HTML, tokens, SVG
  • Agent platforms and connected canvases, compared honestly
  • A decision path for choosing a starting stack — and what migration really costs

This is the tools module — but the durable lesson is about formats, not vendors. Formats outlive every tool named on these slides.

Slide notes

This module is the one most likely to age, and it is worth saying that to the room before any product name appears. Versions, prices, and even whether a tool exists next quarter all move faster than course material can. The protection against that is the structure of the module: the format layer first, because diffable artifacts are the slow-moving foundation; the tools second, as a dated snapshot; and a decision path last, built on the differences that change slowly rather than the features that change weekly.

Set the relationship to the previous modules. Module 1 introduced diffability as a single property and promised the full treatment here. Module 4 built the harness — instruction files, tokens, skills — and that harness is exactly the asset that makes the tool choices in this module low-stakes, because it ports between platforms. If participants skipped Module 4, the choosing-a-stack slides will still work, but the migration slide will land harder.

Also flag what this module is not: it is not a recommendation of one platform or one canvas. The course's published platform and canvas articles are kept current and date-stamped; this module teaches the comparison dimensions and the decision structure, and points at those articles for the latest figures.

Narration for this slide

Welcome to Module 5. So far this course has been about practice — the loop, the brief, the harness. This module is about the ground all of that stands on: the formats your design work lives in, and the tools that can reach them. We will start with the property that decides everything — whether the artifact is diffable text or an opaque binary — then compare the four agent platforms and the connected canvases honestly, including prices and limits. And we will finish with a decision path for picking a starting stack, plus a clear-eyed look at what migrating an existing system actually involves. The tools will change. The way you choose between them does not have to.

Slide 2 of 1316:9

The artifact decides what the agent can do

Before any tool comparison, one question: can the agent open the file your design lives in?

  • Agents work on text: they read it, edit it precisely, and produce a change you can inspect
  • Design decisions as text: DTCG token JSON, generated CSS variables, DESIGN.md
  • Design surfaces as text: .pen and .op canvas files, HTML and component code, SVG
  • If the source of truth is a format the agent cannot open, it improvises the missing decisions

Where the design lives is not an implementation detail. It is the decision that determines whether an agent can genuinely participate in the work.

Slide notes

This slide restates the Module 1 claim with the precision the rest of the module needs. An agent's reach is bounded by what it can read and write. A coding agent can open your repository, your CSS, your token JSON, and your markdown design system in seconds. It cannot open a proprietary binary design file at all, except through a vendor API or a plugin that returns a partial copy. When the canonical version of a design decision lives in a format the agent cannot open, the agent fills the gap with the most common pattern it knows — which is how you get plausible, generic output that was never grounded in your actual system.

Name the two families of text artifact. Decisions live as design tokens — structured DTCG JSON transformed into CSS variables or theme files — and as written guidance like DESIGN.md. Surfaces live as text in three forms: token output, text-based canvas files such as .pen and .op which store the canvas as a JSON tree of nodes, components, and variables, and HTML or component code where the artifact is already the implementation.

The practical consequence to land: most design-versus-implementation arguments are not disagreements about taste. They are two copies of the same decision quietly diverging because the decision was recorded in a place only one application could read. Design-as-code is the inversion that stops that drift, and it is what makes the rest of this module's tooling worth discussing at all.

Narration for this slide

Here is the question to ask before comparing any tools: can the agent open the file your design actually lives in? Agents work on text. They read it, change it precisely, and show you the change. So the design-as-code move is to let decisions live as text — token JSON, generated CSS variables, a DESIGN.md — and increasingly to let the surfaces live as text too: .pen and .op canvas files, HTML, component code, SVG. When the source of truth is a binary only one app can open, the agent improvises the missing decisions, and you spend review correcting work that was never grounded in your system. Where the design lives decides what the agent can do. Everything else in this module follows from that.

Slide 3 of 1316:9

Why binary files block agents

Three things break when the design lives in an opaque binary: read access, precise edits, and honest review.

  • Read access — the agent sees an exported picture or an API copy, never the artifact itself
  • Precise edits — changes go through a translation layer, so values get re-typed and drift
  • Honest review — git records only that a binary changed; review collapses to looks fine to me
  • History — nobody can answer when the primary colour changed, who changed it, and why
  • Diffable text restores all four: line-level diffs, blame, revert, CI checks, and agent participation

A diff shows exactly what changed and nothing else. That single property is what turns an agent's design change from an act of faith into a pull request.

Slide notes

Walk the failure modes concretely, because each one maps to a frustration designers already have. Read access: a flattened export carries no token names, no spacing intent, no component boundaries, so the agent guesses. Precise edits: in the hand-off world a colour change is an edit inside the design tool, a notification, a re-export, and a manual re-type into CSS — every hop a chance for drift. Honest review: a commit that touches a binary says only that a binary changed, so code review cannot see what moved and CI cannot lint it. History: six months later the simple question — when did this change, who changed it, why — has no answer.

Then show what diffability buys, using the worked evidence from the design-as-code article: a single semantic token change produced a two-file diff with one meaningful line in each — the token source recording the decision, the generated CSS recording the consequence — and the components that consumed the token did not appear in the diff at all because they only ever referenced the variable. That is the contract working: decisions change in one place, surfaces follow, and the review shows the exact blast radius.

Be honest about the limit. Diffable does not mean every diff is pleasant. A canvas-file change that moves forty nodes is technically text but practically unreadable; you review it through the rendered screenshot, not line by line. The claim worth making is narrower: the decisions you care about reviewing — variables, components, bindings, tokens — produce diffs a human can actually read.

Narration for this slide

What actually breaks with a binary design file? Three things. The agent cannot read it — it sees an exported picture or a partial API copy, never the artifact. It cannot edit it precisely — every change goes through a translation layer where values get re-typed and drift. And nobody can review it honestly — git only knows that a binary changed, so review collapses to looks fine to me. Make the artifact text and all of that comes back: line-level diffs, blame, revert, CI checks. In the worked example from this course's source material, changing one semantic token produced a two-line diff and the components followed automatically. One caveat: not every text diff is readable. A forty-node layout change still gets reviewed by screenshot. The decisions, though — tokens, variables, components — those diff like sentences.

Slide 4 of 1316:9

The design-as-code stack

Four layers. The artifacts at the base are the durable asset; everything above them can be swapped.

Layered stack diagram of the design-as-code workflow. At the base sit diffable design artifacts: .pen and .op canvas files, HTML and component code, DTCG token JSON, and SVG with markdown design docs. Above them sit the four agent platforms — Claude Code, Codex CLI, OpenCode, and Gemini CLI — which read and edit those artifacts directly so every change lands as a reviewable diff. Above the platforms sits the bridge layer of connected canvases and MCP servers — Paper, Pencil, OpenPencil, Open Design, and the Figma MCP server — joined to the platforms by dashed bidirectional MCP arrows. At the top sit the outputs the stack produces: prototypes, production UI, decks and docs, and audits and QA reports, with a note that outputs pass human review gates before shipping.
Diffable artifacts at the base, agent platforms reading and writing them directly, connected canvases and MCP servers as the bridge layer, and outputs at the top. Solid arrows are work flowing up; dashed arrows are MCP connections. The base layer and the harness are what travel when any tool above them is replaced.

Read the stack bottom-up when choosing, top-down when working. The investment goes at the bottom; the churn happens in the middle.

Slide notes

Walk the diagram from the base. The bottom layer is the diffable artifacts: .pen and .op canvas files, HTML and component code, DTCG token JSON, SVG and markdown design docs — text in the repository, owned by the team rather than a vendor. The next layer up is the agent platforms — Claude Code, Codex CLI, OpenCode, Gemini CLI — which read and edit those artifacts directly, so every change lands as a diff that goes through the same review gates as any other change. The bridge layer is the connected canvases and MCP servers — Paper, Pencil, OpenPencil, Open Design, and Figma's MCP server — which give the agent a visual surface or an external tool to read from and write to. The top layer is what comes out: prototypes, production UI, decks and audits.

Point at the two arrow styles. Solid arrows are work flowing upward — artifacts into platforms into outputs. Dashed arrows are MCP: one protocol, both directions, used to read frames, variables, and screenshots and to write nodes, styles, and tokens back. MCP is what stops every canvas-platform pairing from needing its own bespoke integration; the next slides cover the platforms and canvases individually and MCP gets its own slide after that.

The strategic reading is in the highlight: the base layer plus the Module 4 harness is the durable asset. Platforms get repriced, canvases are in early access or beta, and one of the four CLIs in the middle layer is being retired for individual users this very month. The teams that absorb that churn cheaply are the ones whose design knowledge lives at the bottom of this diagram, not the middle.

Narration for this slide

Here is the whole module in one picture. At the base: diffable design artifacts — .pen and .op files, HTML and components, token JSON, design docs — text that lives in your repository. Above them: the agent platforms — Claude Code, Codex CLI, OpenCode, Gemini CLI — reading and editing those files directly, so every change is a reviewable diff. The bridge layer is the connected canvases and MCP servers — Paper, Pencil, OpenPencil, Open Design, Figma's MCP — giving agents a visual surface to read and write through one shared protocol. And at the top, the outputs: prototypes, production UI, decks, audits. Read it bottom-up when you choose tools and top-down when you work. The base is where your investment belongs, because it is the layer that survives when anything above it changes.

Slide 5 of 1316:9

Agent platforms compared, as of June 2026

All four read an instruction file, support skills, and speak MCP. The differences are operational: governance, sandboxing, and how you pay.

PlatformGovernance and model accessSandboxing and permissionsCost shape (verified 2026-06)Notable for design work
Claude CodeAnthropic, proprietary client and modelsGranular permission rules plus a bash sandbox; plan modeSubscription: Pro ~$20, Max $100–200, five-hour usage windowsDeepest skills and MCP ecosystem; strongest design-task track record
Codex CLIOpenAI; client open source, models proprietaryOS-level sandbox modes separate from approval policy; cloud sandbox tasksIncluded in ChatGPT plans from ~$20; token credits since April 2026Clearest sandbox story; tight GitHub workflow fit
OpenCodeOpen source (MIT); 75+ model providers, bring your own keyPolicy-based permissions; no OS sandbox runtimeAgent free; you pay for models — BYOK, credits, or ~$10 open-weight planReads AGENTS.md and existing harnesses unchanged; model flexibility
Gemini CLIGoogle; transitioning to Antigravity for individualsPer-tool approvals, optional sandboxFree tier plus Google AI plans ~$8–200Largest context windows; strong multimodal input — verify the transition before adopting

Read the rows that change slowly — governance, sandboxing, instruction files — harder than the rows that change weekly. And date-stamp every price you write down.

Slide notes

This expands the one-line comparison from Module 1 into the dimensions that actually decide a team's week. Start with what is the same, because it is the reason the choice is reversible: all four read a project instruction file — AGENTS.md natively or via a one-line import — all four support the SKILL.md format, all four speak MCP, and all four have a plan-before-build mode. That convergence on open standards is why the Module 4 harness ports between them in roughly an hour.

Then the honest differences. Claude Code pairs the deepest design ecosystem — skills, plugins, design-adjacent MCP servers — with subscription-window economics; heavy iteration days can hit the five-hour windows. Codex CLI has the clearest containment story: OS-level sandbox modes separated from the approval policy, plus cloud sandbox tasks for parallel work; usage is metered as token credits inside ChatGPT plans, which never stalls but can surprise. OpenCode is open source and model-agnostic — over seventy-five providers, bring your own key — and the most accommodating of an existing harness, but there is no OS sandbox underneath its policy rules and the model, cost, and data-handling decisions become your job. Gemini CLI offers the largest context windows and a generous free tier, and is also mid-transition: individual access ends in June 2026 in favour of Antigravity, which is the strongest available evidence that deprecation risk belongs in the comparison.

Close with the method point: every figure in this table was verified on a stated date and will drift. Teach the habit of writing last verified next to any volatile fact, and of running your own brief through a candidate platform rather than trusting benchmarks — including this table.

Narration for this slide

Here are the four platforms on the dimensions that matter. First, what is identical: all four read an instruction file, all four support skills, all four speak MCP — which is exactly why your harness ports and the choice stays reversible. The differences are operational. Claude Code has the richest skills and MCP ecosystem, on a subscription with usage windows. Codex CLI has the clearest sandbox model and cloud sandbox tasks, metered as token credits inside ChatGPT plans. OpenCode is open source with seventy-five-plus model providers — maximum flexibility, and the cost and security decisions become yours. Gemini CLI has huge context and a free tier, and is mid-transition to Antigravity, which is your reminder that vendors retire tools. Every price here carries a date. Re-check it before you budget anything.

Slide 6 of 1316:9

Canvases and formats: the diffability test

Five connected canvases and the incumbent, scored on the properties that decide whether an agent can genuinely work on the design.

Tool / formatArtifactDiffableGit-trackableAgent-readableMCPStatus (2026-06)
PaperHTML/CSS canvas, cloud documentPartially — the artifact is markup, the document lives in Paper's cloudNo (screenshots stand in for diffs)Yes — read JSX, styles, screenshotsLocal server, ~20 granular toolsFree tier with MCP-call cap; Pro ~$20/mo
Pencil.pen JSON file in the repositoryYesYesYes — nodes, variables, screenshotsBundled with the IDE extensionProprietary; free during early access
OpenPencil.op JSON file, open source (MIT)YesYesYes — typed variables, multi-axis themesYes, plus concurrent agent teamsv0.7.x; young project, watch activity
Open DesignHTML artifacts plus a skills and DESIGN.md systems libraryYes — outputs are markup and textYesYes — drives the agent CLIs you already runWraps existing agents rather than adding a serverApache-2.0, pre-1.0 preview
Huashu DesignHTML-native design SKILL.md plus references and export scriptsYes — the product is textYesYes — it is an instruction the agent loadsNot applicable (skill, not server)MIT since May 2026; written largely in Chinese
Figma + MCPCloud document behind an APINo — binary-equivalent outside the toolVariables exportable; the file itself, noPartially — structured context via the MCP serverOfficial server, beta, read and writeFree during beta; usage-based pricing planned

The pattern: the closer the artifact is to text in your repository, the more honestly an agent can work on it — and the more honestly you can review what it did.

Slide notes

Walk the table by family rather than row by row. Paper is the HTML-native canvas: the design is already markup, the desktop app runs a local MCP server with roughly twenty granular tools, and the trade-off is that the document lives in Paper's cloud, so review leans on before-and-after screenshots rather than git diffs. Pencil and OpenPencil are the repo-resident pair: .pen and .op are JSON trees of nodes, components, and variables that live next to the code, diff cleanly at the decision level, and expose MCP so the agent in your editor can read and write the canvas — Pencil proprietary and free during early access, OpenPencil MIT-licensed with concurrent agent teams and the usual young-project caveats. Open Design and Huashu sit at a different layer: Open Design is an open-source shell that drives whatever agent CLIs you already have and bundles a large library of skills and DESIGN.md-style design systems; Huashu is the proof that an entire design tool can be one SKILL.md — taste, procedure, anti-slop rules, and export scripts shipped as text.

Figma is in the table because pretending it is not in the room helps nobody. Its MCP server is real, official, in beta, and no longer read-only — but the architectural difference stands: the artifact stays in Figma's cloud and the agent works through an API translation, not on a file in your repository. That is a legitimate arrangement; it just optimises for collaboration breadth rather than agent participation. The next-but-one slide treats Figma's position fairly on its own.

Remind the room that columns like status and pricing are dated June 2026 and that two of these tools are pre-1.0 or early access. The diffability and agent-readability columns are the slow-moving ones; weight them accordingly.

Narration for this slide

Now the canvases, run through the diffability test. Paper makes the canvas HTML and CSS — the artifact is already markup — with a local MCP server of about twenty tools; the document lives in Paper's cloud, so review uses screenshots rather than diffs. Pencil and OpenPencil keep the canvas in your repository as .pen and .op JSON files — diffable, git-trackable, agent-editable through MCP; one proprietary and free during early access, one open source with concurrent agent teams. Open Design wraps the agents you already run with a library of skills and design systems. Huashu Design ships an entire design tool as a single skill file. And Figma: real MCP server, genuinely useful, but the artifact stays in its cloud behind an API. The pattern is consistent — the closer the design is to text in your repo, the more an agent can honestly do with it.

Slide 7 of 1316:9

MCP in one slide

The Model Context Protocol is how an agent reaches tools it does not contain: one protocol, many servers, each a narrow door into a design surface or the live page.

  • A server announces a short list of tools — get this frame's variables, take a screenshot, write this HTML
  • Local servers run on your machine (Paper, Pencil, Playwright); remote servers live at a URL with OAuth (Figma, Miro)
  • The same server config ports across Claude Code, Codex CLI, OpenCode, and Gemini CLI — only the syntax differs
  • Governed by a Linux Foundation body since late 2025, with an official registry — not one vendor's plugin system
  • Write actions stay behind the agent's permission prompts; treat third-party servers like any dependency you would audit

MCP replaces the copy-paste tax — exporting PNGs and re-typing hex values — with the agent reading the actual data. That is the whole trick.

Slide notes

Keep this deliberately plain, because the acronym makes it sound more mysterious than it is. Without MCP, the designer is the integration: export a frame as a PNG, paste it into chat, copy hex values one at a time, screenshot the half-built page, and ask the agent to compare two images it can barely see. The agent works from degraded copies of intent, guesses, and you correct the guesses. With MCP, a server exposes a few named tools — return the variables for this frame, screenshot this artboard, write these styles back — and the agent calls them mid-conversation, with the results landing in its context like a file it just read.

Give the two distinctions that predict how servers behave. Local versus remote: local servers run on your machine, often inside a desktop app, and only exist while that app is running — Paper, Pencil, Playwright, Chrome DevTools all work this way; remote servers live at a URL with OAuth — Figma and Miro — and bring rate limits plus the question of what data leaves your machine. Design servers versus browser servers: the design server reads intent, the browser server observes the built reality, and chaining them is what makes a real design-versus-implementation review possible.

Two governance and hygiene points close it. MCP moved to neutral governance under a Linux Foundation body in late 2025 and has an official registry, so building team workflow on it is a safer investment than any single vendor's plugin system. And servers sit inside the agent's trust boundary: prefer official servers, check the registry before installing community ones, and keep write actions behind permission prompts — the same discipline Module 4 applied to the harness.

Narration for this slide

MCP — the Model Context Protocol — is how an agent reaches tools it does not contain. A server announces a few named tools: give me this frame's variables, take a screenshot, write this HTML back. The agent calls them mid-conversation and the results land in its context. Local servers run on your machine, inside Paper or Pencil or a browser driver; remote ones, like Figma's, live at a URL behind OAuth. The same server works on all four agent platforms — only the config syntax changes. And it is not one vendor's plugin system: the protocol sits under neutral governance with an official registry. What it removes is the copy-paste tax — exported PNGs and re-typed hex values. What it requires is the same hygiene as any dependency: official servers first, write access behind permission prompts.

Slide 8 of 1316:9

Where Figma still fits, and where it does not

Figma is not the villain of this module. It is a different optimisation, and most teams will run both worlds for a while.

  • Still wins: real-time multiplayer with non-technical collaborators, mature component and prototyping tooling, the place stakeholders already are
  • Moving toward the stack: an official MCP server with read and write, and announced DTCG-conformant variable export
  • Still does not fit: the file is not in your repository, changes are not diffs, and agents work through an API copy of the decision
  • The pragmatic bridge: tokens — keep the decisions in DTCG JSON and let Figma variables and the codebase both consume them
  • Ripping Figma out to satisfy an architecture diagram trades a real collaboration surface for a theoretical one

Text formats win on review, history, CI, and agent participation. Figma wins on collaboration breadth and editing ergonomics. Tokens are the bridge between the two.

Slide notes

This slide exists to take the religion out of the conversation. Some of the room will hear this module as Figma is dead, and that reading produces bad decisions. The fair statement, dated June 2026: text-based formats win on review, history, CI, and agent participation; the canvas-first incumbent wins on real-time collaboration with non-technical people, mature component and prototyping tooling, an enormous plugin ecosystem, and being where stakeholders already live. Figma is also narrowing the gap from its side — the Dev Mode MCP server is official, in beta, and no longer read-only, and DTCG-conformant variable export has been announced — so the trajectory is toward interoperability, not away from it.

What does not change is architecture. The Figma file stays in Figma's cloud; the agent works through an API translation rather than on a file in the repository; a change to the file is not a diff anyone outside the tool can read. For exploration, stakeholder workshops, and collaborative iteration, that costs little. For the production loop this course teaches — agent edits, review gates, CI checks — it is the difference between participating directly and participating through a window.

Give the practical posture: most teams run both for a while, and tokens are the bridge. Keep decisions in DTCG JSON in the repository, sync them into Figma variables and into the codebase, and the canvas and the runtime become two projections of one source. The anti-pattern is the purity move — ripping out the tool your designers, PMs, and stakeholders actually use to satisfy a diagram. The other anti-pattern is pretending the agent can fully participate in work whose source of truth it cannot open.

Narration for this slide

So where does Figma fit? Honestly: it still wins at the things it has spent a decade building — multiplayer collaboration with non-technical people, mature component tooling, and being where your stakeholders already are. And it is moving toward this stack, with an official MCP server that can now write, and announced token export. What has not changed is the architecture: the file lives in Figma's cloud, changes are not diffs, and agents work on an API copy of the decision rather than the decision itself. The pragmatic answer for most teams is both, with tokens as the bridge — decisions in DTCG JSON, consumed by Figma variables on one side and the codebase on the other. Do not rip out a real collaboration surface to satisfy a diagram. And do not pretend an agent can fully work on a file it cannot open.

Slide 9 of 1316:9

Cost models in plain terms

The sticker prices converge on the same ladder. What differs is how usage is metered — and design work is exactly the kind of usage that finds the limits.

  • The ladder, mid-2026: entry tiers around $20, daily-use tiers around $100, all-day tiers around $200 per month
  • Subscription windows (Claude): predictable cost; heavy iteration days hit the rolling allowance and the loop stalls
  • Credit metering (ChatGPT/Codex, Google's tiers): the loop never stalls, but a heavy review day costs multiples of a normal one
  • Bring-your-own-key (OpenCode, direct API): spend tracks usage, not seats — and tracking it becomes somebody's actual job
  • Connected canvases add their own line: Paper has a free tier with an MCP-call cap and a ~$20 Pro plan; Pencil is free during early access; Figma's MCP pricing arrives after beta

Visual iteration is token-heavy: screenshots in, regenerated layouts out, long design-system context. Pick the failure mode your team can manage, and measure a real week before upgrading.

Slide notes

Keep every number on this slide attached to its date — these figures were captured on the first of June 2026, all three major vendors changed plans or metering in the eight weeks before that, and the official pricing pages are the only source that counts on the day a team decides. What is more durable than the numbers is the shape. The three vendors have converged on the same individual ladder — roughly twenty, one hundred, and two hundred dollars a month — and they describe limits relative to other tiers, five times Pro, twenty times Pro, never as absolute counts. Anyone quoting a hard number of messages per tier is making it up.

The metering models are the real comparison. Subscription windows fail by making you wait: a heavy iteration day exhausts the rolling allowance and the loop stalls mid-critique. Credit metering fails by surprising you: nothing stalls, but a screenshot-heavy review day costs multiples of a normal one. Bring-your-own-key fails by becoming nobody's job: spend spreads across personal keys until finance asks a question no one can answer. None of these is wrong; the right plan is the one whose failure mode the team can actually manage. Design work stresses all three because it is multimodal and iterative — screenshots in, screenshots out, long token and design-system context, repeated regeneration.

Two closing notes. The team and business tiers mostly sell seat administration — central billing, SSO, usage visibility — rather than more capability per person; that matters from about three designers up. And there is a habits dividend that works on every plan: scope context to the task, crop screenshots to the component, use cheaper models for drafts, and measure a real week of usage before paying for the next tier.

Narration for this slide

What does this cost? The sticker prices have converged: entry tiers around twenty dollars, daily-use tiers around a hundred, all-day tiers around two hundred — figures dated June 2026, so check the pricing pages before you decide anything. The real difference is metering. Claude's subscription windows fail by making you wait when a heavy day exhausts the allowance. Credit metering on Codex and Google never stalls, but a screenshot-heavy review day costs several normal days. Bring-your-own-key on OpenCode tracks usage exactly — and tracking it becomes someone's job. The canvases add their own line: Paper has a capped free tier and a Pro plan, Pencil is free during early access, Figma's MCP pricing lands after beta. Pick the failure mode you can manage, and measure one real week before upgrading.

Slide 10 of 1316:9

Choosing a starting stack: a decision path, not a verdict

Five questions, in the order that constrains the answer most. The footer matters more than any single answer.

  • 1. What does the team already pay for? The agent included in an existing Claude or ChatGPT plan is the cheapest one to evaluate seriously
  • 2. What are the safety requirements? Client repositories and regulated work weight Codex's OS sandbox or Claude Code's permission rules; pure policy-based control may not be enough
  • 3. What does the deliverable need to be? Marketing pages and prototypes suit an HTML-native canvas like Paper; product UI tied to a component library suits a repo-resident .pen or .op file; an existing Figma system points at its MCP server plus tokens
  • 4. How much model and licensing flexibility do you need? Open source and bring-your-own-key point at OpenCode and OpenPencil — with the cost and security work that follows
  • 5. How exposed are you to churn? Prefer open formats, date-stamp volatile facts, and assume at least one tool in your stack changes terms within a year
  • Footer: keep the artifacts and the harness in open formats — that is what makes every answer above reversible

A starting stack is an experiment you can run this month, not an architecture you must defend for three years. Choose for reversibility, then choose quickly.

Slide notes

Frame the decision path the way the platform article does: ordered by how much each question constrains the answer, not by how interesting it is to debate. Existing subscriptions come first because they make the evaluation nearly free — most organisations already pay for Claude or ChatGPT, and the agent inside that plan is the cheapest serious trial. Safety rails come second because they are non-negotiable where they apply at all. The deliverable question is the design-specific one this module adds: what the team makes most often points at a canvas family — HTML-native for marketing and prototype work, repo-resident files for product UI that lives with a component library, and the Figma-plus-MCP-plus-tokens route for teams whose system already lives there. Model flexibility and licensing constraints point at the open-source options, with the honest reminder that flexibility is work you take on, not a free feature. Churn tolerance is last but real: the Gemini CLI retirement happened to a tool that was open source, actively maintained, and backed by one of the largest companies on earth.

A sensible default for a team starting from zero, stated as a default rather than a verdict: the agent platform an existing subscription already includes, one connected canvas matched to the dominant deliverable, tokens in DTCG JSON from day one, and the harness from Module 4 in AGENTS.md and skill files. That stack can be assembled in a week and abandoned in an afternoon, which is exactly the property to optimise for.

Close on the footer. Every answer above is reversible only if the design knowledge lives in open formats — the artifacts at the base of the stack diagram and the harness files. If the knowledge lives in one platform's proprietary config, the team has converted a low-stakes choice into a hostage situation, and no amount of careful comparison fixes that.

Narration for this slide

How do you actually choose? Five questions, in order. What does the team already pay for — because the agent inside an existing plan is the cheapest one to evaluate. What are the safety requirements — client work and regulated environments push you toward the platforms with real sandboxing. What is the deliverable — marketing pages suit an HTML-native canvas, product UI suits a repo-resident design file, an existing Figma system suits its MCP server plus tokens. How much model and licensing flexibility do you need — open source buys freedom and hands you the cost and security work. And how exposed are you to churn — assume something in your stack changes terms within a year. Then the footer, which matters most: keep the artifacts and the harness in open formats. That is what makes every one of these answers reversible, and a reversible choice does not deserve months of deliberation.

Slide 11 of 1316:9

Migration reality: what moving an existing system involves

Adopting design-as-code in an existing product is incremental work with known friction points, not a heroic rewrite.

  • Naming collisions are the first wall: generated token names rarely match the variables already deployed — map, migrate surface by surface, or rename, and none of it is free
  • Start with the decisions that change most often: brand colours, status colours, radius, a few spacing steps — a small enforced contract beats a complete ignored one
  • Generated files need a freshness check in CI, or hand edits silently drift from the source
  • The big-bang rename and the everything-tokenized-on-day-one project are the two classic ways this stalls
  • What ports when you later switch tools: the token JSON, the harness files, the MCP server list. What does not: permissions, pricing, and the team's muscle memory

Budget for the unglamorous parts — naming, mapping, CI checks, and a dip in speed while habits rebuild. The teams that stall are the ones that pretended those costs were zero.

Slide notes

This slide keeps the module honest, because the worked evidence behind it includes failures. In the design-as-code case study, the token-to-CSS pipeline took three iterations: the sync script's first version copied DTCG alias strings straight into the CSS, the browser silently dropped the invalid declarations, and the card rendered unstyled — a quiet failure, not a loud one. The fix was one added sentence in the prompt about what aliases must become. The second friction was naming: the generated variables followed the token path while the existing codebase used shorter names, which means any real adoption chooses between renaming deployed variables, maintaining a mapping layer, or migrating component by component. None of those is free, and pretending otherwise is how token projects stall.

Give the sequencing that works: tokenize the decisions that actually change — brand colours, status colours, radius, a couple of spacing steps — enforce them with a no-raw-values check and a generated-file freshness check in CI, route every token change through the same pull-request gate humans use, and only then expand coverage. If a .pen or .op canvas joins later, bind its variables to the same token names so the canvas and the runtime stay two projections of one source.

Then the switching-cost honesty from the platform material: when a team later changes agent platform or canvas, the harness, tokens, and MCP server list move in about an hour; permissions, plans, and platform-only features get rebuilt; and the team's muscle memory — keyboard habits, prompt shapes, permission reflexes — takes a week or two, during which output dips. Portability makes switching cheap, not free. Saying that out loud up front is what keeps the second migration from becoming a crisis of confidence in the whole approach.

Narration for this slide

What does migration actually involve? Less than a rewrite, more than a demo. The first wall is naming: generated token names rarely match the variables already in production, so you map, migrate surface by surface, or rename — and none of those is free. Start small: tokenize the decisions that change most, enforce them with CI checks, and route changes through normal pull requests. The two classic stalls are the big-bang rename and the attempt to tokenize everything on day one. And when you later switch tools — you will — the token JSON, the harness, and your MCP list move in about an hour; permissions and pricing get rebuilt; and the team's muscle memory takes a couple of weeks to recover. Budget for the unglamorous parts. That is the difference between adoption and a stalled project.

Slide 12 of 1316:9

Exercise: run your toolchain through the diffability test

Map where your design decisions and surfaces actually live today, and what an agent could honestly do with each one. One page, thirty minutes.

  • List every place a design decision lives: design files, token files, theme CSS, wikis, slide decks, someone's head
  • For each, score it: diffable, git-trackable, agent-readable, reachable over MCP — yes, partially, or no
  • Mark the single source of truth for colour, type, spacing, and components — or admit there is more than one
  • Pick a candidate starting stack using the five-question decision path, and write one sentence defending it
  • Note the first migration step you could take this month, and the one cost you expect it to carry

Bring the page to Module 6. The quality gates there are written against the artifacts you just mapped — and they only work on artifacts an agent can read.

Slide notes

Like the Module 1 exercise, this one is analogue on purpose: no installs, no terminal, one page. The inventory step is where the learning happens, because most teams discover that their design decisions live in more places than anyone admits — the canvas file, a theme CSS file that has drifted from it, a brand deck, a wiki page last updated two years ago, and the memory of whoever has been there longest. Scoring each location against the four properties — diffable, git-trackable, agent-readable, MCP-reachable — turns the vague sense that our setup is messy into a specific map of where an agent could help today and where it would be guessing.

The source-of-truth question is deliberately uncomfortable. If colour has two sources of truth, the team does not have a design system; it has two, and an agent will faithfully follow whichever one it can read. That observation alone justifies the exercise for most teams.

The last two bullets connect the exercise to action. The defence sentence forces the decision path to actually be used rather than admired — existing subscription, safety needs, deliverable type, flexibility, churn — and the first-migration-step question keeps the output small and real: a tokens file for the five most-changed decisions, an MCP server connected to one canvas, or a single component rebuilt against tokens. Collect the expected-cost answers if running this live; they map almost exactly onto the friction points from the migration slide, which is reassuring rather than discouraging — the costs are known, named, and budgetable.

Narration for this slide

Time to apply this to your own setup. Take one page and thirty minutes. First, list every place a design decision actually lives — the canvas file, the theme CSS, the wiki, the deck, somebody's memory. Score each one: is it diffable, git-trackable, agent-readable, reachable over MCP? Then mark the single source of truth for colour, type, spacing, and components — or admit that there is more than one, which is the most common and most useful finding. Next, pick a candidate starting stack using the five questions from the decision path, and write one sentence defending it. Finally, note the first migration step you could take this month and the cost you expect it to carry. Keep the page. Module 6 builds its quality gates on exactly the artifacts you just mapped.

Slide 13 of 1316:9

Summary, and what comes next

  • The artifact decides what the agent can do: diffable text — .pen, .op, HTML, tokens, SVG — is the foundation; binary files block read access, precise edits, and honest review
  • The four platforms converge on open standards and differ in governance, sandboxing, and cost shape — the harness ports, so the choice is reversible
  • Connected canvases bridge visual editing and readable files; MCP is the protocol that connects platforms, canvases, and tools without bespoke integrations
  • Figma still wins on collaboration breadth; text formats win on review and agent participation; tokens are the bridge most teams will run for a while
  • Choose a starting stack with the five-question path, keep the knowledge in open formats, and budget honestly for migration friction

Module 6 closes the course: critique, executable quality gates, the failure modes that keep recurring, and an honest account of what agents are still bad at.

Slide notes

Recap by tracing the module's spine rather than re-reading the bullets. The format question came first because it is the slow-moving one: diffability is what lets an agent participate and lets a human review the participation, and that property does not expire when a vendor changes its pricing. The tool comparisons sat on top of that as a dated snapshot — accurate on the day they were verified, and built so that the comparison dimensions outlive the specific cells. The decision path and the migration slide turned the landscape into something a team can act on this month without betting itself on a moving target.

If one idea should survive the module, it is the asymmetry: the artifacts and the harness are the durable assets, the platforms and canvases are the replaceable layer, and a team that keeps its design knowledge in open formats can treat every tool decision in this module as low-stakes. That is also the answer to the anxiety this module sometimes produces — the sense that everything is changing too fast to choose anything. Choose quickly, keep it reversible, re-verify on a cadence.

Preview Module 6 concretely: it takes the artifacts and stack from this module and asks how you know the work is good — structured critique with named dimensions, screenshot evidence and visual QA, checks that can be made executable, the recurring failure modes worth automating against, and a plain statement of what agents are still bad at. The exercise page from this module feeds directly into it, because quality gates can only run on artifacts the agent can read.

Narration for this slide

Let's close the module. The artifact decides what the agent can do — diffable text is the foundation, and binary files block reading, precise edits, and honest review. The four platforms have converged on open standards and differ in governance, sandboxing, and how you pay; because the harness ports, the choice stays reversible. Connected canvases give you visual editing on top of readable files, and MCP is the protocol that ties platforms, canvases, and tools together. Figma keeps winning at collaboration; text wins at review and agent participation; tokens bridge the two. Choose your starting stack with the five questions, keep the knowledge in open formats, and budget honestly for migration. Module 6 is the last one: critique, quality gates, and what agents are still bad at. See you there.

Module transcript
Module 5, narrated slide by slide

Slide 1Design-as-Code and the Tool Landscape

Welcome to Module 5. So far this course has been about practice — the loop, the brief, the harness. This module is about the ground all of that stands on: the formats your design work lives in, and the tools that can reach them. We will start with the property that decides everything — whether the artifact is diffable text or an opaque binary — then compare the four agent platforms and the connected canvases honestly, including prices and limits. And we will finish with a decision path for picking a starting stack, plus a clear-eyed look at what migrating an existing system actually involves. The tools will change. The way you choose between them does not have to.

Slide 2The artifact decides what the agent can do

Here is the question to ask before comparing any tools: can the agent open the file your design actually lives in? Agents work on text. They read it, change it precisely, and show you the change. So the design-as-code move is to let decisions live as text — token JSON, generated CSS variables, a DESIGN.md — and increasingly to let the surfaces live as text too: .pen and .op canvas files, HTML, component code, SVG. When the source of truth is a binary only one app can open, the agent improvises the missing decisions, and you spend review correcting work that was never grounded in your system. Where the design lives decides what the agent can do. Everything else in this module follows from that.

Slide 3Why binary files block agents

What actually breaks with a binary design file? Three things. The agent cannot read it — it sees an exported picture or a partial API copy, never the artifact. It cannot edit it precisely — every change goes through a translation layer where values get re-typed and drift. And nobody can review it honestly — git only knows that a binary changed, so review collapses to looks fine to me. Make the artifact text and all of that comes back: line-level diffs, blame, revert, CI checks. In the worked example from this course's source material, changing one semantic token produced a two-line diff and the components followed automatically. One caveat: not every text diff is readable. A forty-node layout change still gets reviewed by screenshot. The decisions, though — tokens, variables, components — those diff like sentences.

Slide 4The design-as-code stack

Here is the whole module in one picture. At the base: diffable design artifacts — .pen and .op files, HTML and components, token JSON, design docs — text that lives in your repository. Above them: the agent platforms — Claude Code, Codex CLI, OpenCode, Gemini CLI — reading and editing those files directly, so every change is a reviewable diff. The bridge layer is the connected canvases and MCP servers — Paper, Pencil, OpenPencil, Open Design, Figma's MCP — giving agents a visual surface to read and write through one shared protocol. And at the top, the outputs: prototypes, production UI, decks, audits. Read it bottom-up when you choose tools and top-down when you work. The base is where your investment belongs, because it is the layer that survives when anything above it changes.

Slide 5Agent platforms compared, as of June 2026

Here are the four platforms on the dimensions that matter. First, what is identical: all four read an instruction file, all four support skills, all four speak MCP — which is exactly why your harness ports and the choice stays reversible. The differences are operational. Claude Code has the richest skills and MCP ecosystem, on a subscription with usage windows. Codex CLI has the clearest sandbox model and cloud sandbox tasks, metered as token credits inside ChatGPT plans. OpenCode is open source with seventy-five-plus model providers — maximum flexibility, and the cost and security decisions become yours. Gemini CLI has huge context and a free tier, and is mid-transition to Antigravity, which is your reminder that vendors retire tools. Every price here carries a date. Re-check it before you budget anything.

Slide 6Canvases and formats: the diffability test

Now the canvases, run through the diffability test. Paper makes the canvas HTML and CSS — the artifact is already markup — with a local MCP server of about twenty tools; the document lives in Paper's cloud, so review uses screenshots rather than diffs. Pencil and OpenPencil keep the canvas in your repository as .pen and .op JSON files — diffable, git-trackable, agent-editable through MCP; one proprietary and free during early access, one open source with concurrent agent teams. Open Design wraps the agents you already run with a library of skills and design systems. Huashu Design ships an entire design tool as a single skill file. And Figma: real MCP server, genuinely useful, but the artifact stays in its cloud behind an API. The pattern is consistent — the closer the design is to text in your repo, the more an agent can honestly do with it.

Slide 7MCP in one slide

MCP — the Model Context Protocol — is how an agent reaches tools it does not contain. A server announces a few named tools: give me this frame's variables, take a screenshot, write this HTML back. The agent calls them mid-conversation and the results land in its context. Local servers run on your machine, inside Paper or Pencil or a browser driver; remote ones, like Figma's, live at a URL behind OAuth. The same server works on all four agent platforms — only the config syntax changes. And it is not one vendor's plugin system: the protocol sits under neutral governance with an official registry. What it removes is the copy-paste tax — exported PNGs and re-typed hex values. What it requires is the same hygiene as any dependency: official servers first, write access behind permission prompts.

Slide 8Where Figma still fits, and where it does not

So where does Figma fit? Honestly: it still wins at the things it has spent a decade building — multiplayer collaboration with non-technical people, mature component tooling, and being where your stakeholders already are. And it is moving toward this stack, with an official MCP server that can now write, and announced token export. What has not changed is the architecture: the file lives in Figma's cloud, changes are not diffs, and agents work on an API copy of the decision rather than the decision itself. The pragmatic answer for most teams is both, with tokens as the bridge — decisions in DTCG JSON, consumed by Figma variables on one side and the codebase on the other. Do not rip out a real collaboration surface to satisfy a diagram. And do not pretend an agent can fully work on a file it cannot open.

Slide 9Cost models in plain terms

What does this cost? The sticker prices have converged: entry tiers around twenty dollars, daily-use tiers around a hundred, all-day tiers around two hundred — figures dated June 2026, so check the pricing pages before you decide anything. The real difference is metering. Claude's subscription windows fail by making you wait when a heavy day exhausts the allowance. Credit metering on Codex and Google never stalls, but a screenshot-heavy review day costs several normal days. Bring-your-own-key on OpenCode tracks usage exactly — and tracking it becomes someone's job. The canvases add their own line: Paper has a capped free tier and a Pro plan, Pencil is free during early access, Figma's MCP pricing lands after beta. Pick the failure mode you can manage, and measure one real week before upgrading.

Slide 10Choosing a starting stack: a decision path, not a verdict

How do you actually choose? Five questions, in order. What does the team already pay for — because the agent inside an existing plan is the cheapest one to evaluate. What are the safety requirements — client work and regulated environments push you toward the platforms with real sandboxing. What is the deliverable — marketing pages suit an HTML-native canvas, product UI suits a repo-resident design file, an existing Figma system suits its MCP server plus tokens. How much model and licensing flexibility do you need — open source buys freedom and hands you the cost and security work. And how exposed are you to churn — assume something in your stack changes terms within a year. Then the footer, which matters most: keep the artifacts and the harness in open formats. That is what makes every one of these answers reversible, and a reversible choice does not deserve months of deliberation.

Slide 11Migration reality: what moving an existing system involves

What does migration actually involve? Less than a rewrite, more than a demo. The first wall is naming: generated token names rarely match the variables already in production, so you map, migrate surface by surface, or rename — and none of those is free. Start small: tokenize the decisions that change most, enforce them with CI checks, and route changes through normal pull requests. The two classic stalls are the big-bang rename and the attempt to tokenize everything on day one. And when you later switch tools — you will — the token JSON, the harness, and your MCP list move in about an hour; permissions and pricing get rebuilt; and the team's muscle memory takes a couple of weeks to recover. Budget for the unglamorous parts. That is the difference between adoption and a stalled project.

Slide 12Exercise: run your toolchain through the diffability test

Time to apply this to your own setup. Take one page and thirty minutes. First, list every place a design decision actually lives — the canvas file, the theme CSS, the wiki, the deck, somebody's memory. Score each one: is it diffable, git-trackable, agent-readable, reachable over MCP? Then mark the single source of truth for colour, type, spacing, and components — or admit that there is more than one, which is the most common and most useful finding. Next, pick a candidate starting stack using the five questions from the decision path, and write one sentence defending it. Finally, note the first migration step you could take this month and the cost you expect it to carry. Keep the page. Module 6 builds its quality gates on exactly the artifacts you just mapped.

Slide 13Summary, and what comes next

Let's close the module. The artifact decides what the agent can do — diffable text is the foundation, and binary files block reading, precise edits, and honest review. The four platforms have converged on open standards and differ in governance, sandboxing, and how you pay; because the harness ports, the choice stays reversible. Connected canvases give you visual editing on top of readable files, and MCP is the protocol that ties platforms, canvases, and tools together. Figma keeps winning at collaboration; text wins at review and agent participation; tokens bridge the two. Choose your starting stack with the five questions, keep the knowledge in open formats, and budget honestly for migration. Module 6 is the last one: critique, quality gates, and what agents are still bad at. See you there.