AAgentic Design School
Module 1 of 6
40–50 minutes

Design Systems for Agents

Design Tokens Are Agent Instructions

Tokens stop being a styling convenience and become the primary way the system tells an agent what is allowed. This module covers naming that carries intent, semantic versus raw values, and what happens when tokens are missing or vague.

Duration40–50 minutes

Slides13 slides with notes and narration

Learning objectives

  • Explain why semantic token names change agent behaviour where raw values do not.
  • Audit an existing token set for names that mislead or under-specify.
  • Define the token rules an agent must follow and where they live in the harness.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Design Tokens Are Agent Instructions

Design Systems for Agents · Module 1 of 6

  • The token name is the instruction, not just the value
  • Raw values vs semantic tokens: what an agent does with each
  • What agents do when tokens are missing: hardcode and guess
  • Token rules in the harness, and how to audit your own set

If a design decision is not exposed as a token an agent can read, the agent will make that decision for you — differently every run.

Slide notes

This module sets the foundation for the whole course: the design system only works for agents to the degree that its decisions are exposed as inspectable project material. Tokens are the smallest, most concrete unit of that material, which is why the course starts here rather than with DESIGN.md or audits.

Be explicit about the reframe in the title. Most teams treat tokens as a styling convenience — a way to avoid repeating hex values and to support theming. That framing undersells them. When an agent generates UI, the token file is one of the few places it can learn what the product has actually decided: which red means escalated, how tight a table row should be, what radius a chip is allowed. The name and description carry the instruction; the value is almost the least interesting part.

Set expectations on prerequisites. Participants should already run a design system of some kind — even a small one — and should have seen an agent generate UI at least once. The module does not require any particular token tooling; the examples use the DTCG JSON format and CSS variables because they are the most portable, but the principles apply to Tailwind themes and canvas variables equally.

Narration for this slide

Welcome to Design Systems for Agents. This first module is about design tokens, but not in the way you have probably discussed them before. We are not going to talk about tokens as a styling convenience or a theming mechanism. We are going to talk about them as instructions — the most concrete way your system tells an agent what is allowed. We will look at what an agent actually does with raw values versus semantic names, what happens when tokens are missing, where the token rules belong in your harness, and how to audit the set you already have. Let's start with why this matters at all.

Slide 2 of 1316:9

Agents need decisions, not vibes

When agent output looks generic, the prompt usually gets the blame. More often, the project never exposed its decisions.

  • "Calm, dense, premium, operational" gives direction — it does not pick a warning colour or a row height
  • An agent fills every unstated decision with the most common pattern in its training data
  • Tokens turn taste into inspectable project material: named, typed, described, versioned
  • They do not replace taste — they make taste repeatable across runs and across people

A token is a named design decision. The name and description are what the agent acts on; the value is just the payload.

Slide notes

Open with the failure everyone has seen: the agent produces something polished and plausible, and it does not look like your product. The instinct is to write a longer prompt asking for better taste. That rarely works, because the problem is not eloquence — it is that the actual decisions live in someone's head, in Figma comments, or in chat history, none of which the agent reads at generation time.

The definition to land is that a token is a named design decision, not a variable that holds a colour. The Design Tokens Community Group format makes this explicit: a token has a type, a value, and a description, and the description is where the meaning lives. For agentic work the practical point is that the agent reads the name and the description and decides where the value belongs — which is exactly the behaviour you cannot get from a palette swatch.

Close with the limit so nobody overclaims: tokens make taste applicable and testable, they do not generate it. Choosing that escalated work gets a hot foreground colour while blocked work stays muted is a design decision a person makes once. The token is how that decision survives contact with the next hundred agent runs.

Narration for this slide

Here is the failure that motivates this whole module. You ask an agent for a dashboard, and what comes back is polished, plausible, and generic. The temptation is to blame the prompt. But words like calm, dense, or premium give direction — they do not pick the warning colour or the table row height. Every decision you have not stated, the agent fills in with the most common pattern it has seen, which is by definition not your product. Tokens fix the supply problem. A token is a named design decision the agent can read: a name, a type, a value, and a description that says when it applies. The taste is still yours. The token is how it survives the next hundred runs.

Slide 3 of 1316:9

Raw values vs semantic tokens

The same colour can reach the agent two ways. What the agent can do with it is completely different.

Raw value or primitiveSemantic token
What the agent sees#dc2626, red-600, spacing-4color.status.escalated.foreground
What it can inferA red exists somewhere in the paletteThis red means escalated work, used on text and icons
When it applies itWherever red looks plausible — alerts, accents, decorationOnly where the product means escalated
When the system changesEvery usage is an orphan to hunt downOne alias update propagates everywhere
What review can checkAlmost nothing — the value is legal anywhereName matches usage; misuse is a findable drift

Primitives answer what values exist. Semantics answer what they mean. Agents need the meaning to behave.

Slide notes

This is the core distinction of the module, so take time with it. Primitive tokens — blue-500, gray-900, spacing-4 — are real and necessary; they define the raw ingredients. But if primitives are all the agent can see, it still has to guess when each value belongs, and its guess will be the statistically average answer rather than your product's answer. Semantic tokens carry the product meaning: status.escalated, surface.panel, table.row.compact. Component tokens go one step further and carry local rules, like the radius a status chip uses in dense tables.

The last two rows are where the payoff for agentic work sits. Propagation: when a semantic token aliases a primitive, a rebrand or a contrast fix is a one-line change at the source rather than a hunt across components. Verifiability: a reviewer — human or scripted — can check that a status chip uses the status token, because the name encodes the expectation. With raw values, there is nothing to check against; any red is as legal as any other.

Be honest that maintaining a semantic layer costs effort, and small teams often skip it. The argument of this course is that agents change that calculus: the semantic layer is no longer documentation overhead, it is the interface through which most generated UI will be produced.

Narration for this slide

Here is the distinction the whole module turns on. A raw value — a hex code, red-600, spacing-4 — tells the agent that a value exists. A semantic token — color dot status dot escalated dot foreground — tells the agent what it means. With raw values, the agent applies red wherever red looks plausible: alerts, accents, decoration. With the semantic name, it applies it where the product means escalated, and nowhere else. The difference compounds. When the system changes, semantic tokens propagate the change; raw values become orphans you have to hunt. And review changes too: a name can be checked against its usage. A raw value is legal everywhere, which means it can be checked nowhere.

Slide 4 of 1316:9

Naming that carries intent

A good token name answers four questions before the agent asks them: where, how strong, in what state, how dense.

  • Surface — where it sits: surface.page, surface.panel, surface.overlay
  • Emphasis — how strongly it speaks: text.primary, text.secondary, action.primary, action.quiet
  • State — what it signals: status.escalated, status.blocked, action.danger, input.invalid
  • Density — how much fits: spacing.table.row.compact, spacing.section.relaxed
  • Descriptions finish the job: "escalated means immediate attention — do not reuse as decoration"

Name the decision, not the appearance. "brand-red-dark" describes a colour; "status.escalated.foreground" describes a rule.

Slide notes

The four dimensions on the slide — surface, emphasis, state, density — are not the only possible taxonomy, but they cover the decisions agents most often get wrong: putting content on the wrong surface level, making everything equally loud, reusing a status colour as decoration, and defaulting to roomy spacing in products that need to be dense. If the token names answer those four questions, the agent has far fewer gaps to fill with guesses.

Contrast intent-carrying names with appearance-carrying names. brand-red-dark, blue-light, big-padding all describe what the value looks like, which the agent can already see from the value itself; the name adds nothing. status.escalated.foreground, surface.panel, spacing.table.row.compact describe when the value applies, which is exactly the information the value cannot carry on its own.

Descriptions matter more for agents than they ever did for humans, because the agent actually reads them at generation time. A one-sentence description that states the rule and the boundary — escalated is for immediate attention, do not reuse it as decoration — is the cheapest piece of agent instruction you will ever write. The DTCG format has a dedicated description field; Tailwind themes and CSS variables can carry the same information in adjacent comments or in DESIGN.md, which Module 2 covers.

Narration for this slide

So what does a name that carries intent look like? It answers the questions the agent would otherwise guess. Surface: where does this sit — page, panel, overlay? Emphasis: how strongly should it speak — primary text, quiet action, danger action? State: what does it signal — escalated, blocked, invalid? Density: how much should fit — compact table rows or a relaxed marketing section? Compare that with names like brand-red-dark or big-padding. Those describe the appearance, which the agent can already see from the value. Name the decision instead, and add a one-sentence description that states the boundary — escalated means immediate attention, never decoration. Agents actually read those sentences. They are the cheapest instructions you will ever write.

Slide 5 of 1316:9

Where tokens live

Most systems hold tokens in more than one of these. The agent needs to know which one is the source of truth and which are generated.

LocationFormWhat the agent does with it
Token source filesDTCG-style JSON: primitives, semantics, components, themesReads names, aliases, and descriptions before generating
Generated CSS variablestokens.css / Tailwind v4 theme variablesReferences var(--token) in output; checks runtime values
Component codeTailwind classes and props mapped to the themeReuses existing mappings instead of inventing classes
Canvas variables.pen and .op design variables, Figma variables over MCPReads or syncs the same decisions on the visual surface
DESIGN.mdProse rules referencing the files aboveLearns which tokens are preferred, dangerous, or forbidden

Tell the agent which layer is edited deliberately and which layers are built from it. Source and generated must never be confused.

Slide notes

Walk the table as a stack rather than a list of alternatives. Source token files — DTCG-style JSON split into primitives, semantics, component tokens, and theme overrides — are the layer a person edits deliberately. Build tooling such as Style Dictionary, or a Tailwind v4 theme, turns those into the CSS variables and utility classes the application actually consumes. Component code maps those variables into props and classes. Canvas tools hold the same decisions as design variables: Figma variables reachable over MCP, and connected-canvas formats like .pen and .op, which store variables as readable JSON and generate CSS custom properties on export. DESIGN.md sits alongside all of it and carries the judgment — which tokens are preferred, which are dangerous — which is Module 2's subject.

The instruction that matters for agents is the source-versus-generated distinction. If the agent cannot tell which files are edited and which are built, it will eventually edit a generated file, and the next build silently erases its work — or worse, the edit survives and the source and output disagree. State it explicitly in the harness: edit tokens/, never edit src/styles/tokens.css.

Do not resolve the multi-tool sync question here. Module 5 covers choosing a source of truth across Figma, canvas, and code, and keeping the mirrors honest. For this module it is enough that the agent can find the token contract in the repository it is working in.

Narration for this slide

Where do these tokens actually live? In most real systems, several places at once. Source files in a JSON token format, where a person edits the decision. Generated CSS variables or a Tailwind theme, which the product consumes at runtime. Component code that maps those variables to props and classes. Canvas variables — Figma variables over MCP, or dot-pen and dot-op files where the variables are readable JSON. And DESIGN.md, which carries the judgment about which tokens to prefer. The instruction your agent needs most is the boundary between source and generated. Edit the token files; never hand-edit the built CSS. Get that wrong and the next build quietly erases the work — or leaves the source and the product disagreeing.

Slide 6 of 1316:9

What agents do when tokens are missing

An agent never leaves a decision blank. It fills the gap with the centre of its training data, and the result still looks finished.

  • Hardcodes a plausible value: a near-miss red, a generic shadow, an arbitrary padding
  • Picks differently on every run — drift compounds across components and teams
  • Falls back to framework defaults: rounded-xl cards, soft blue accents, roomy spacing
  • Reuses meaningful colours as decoration because nothing says it cannot
  • Output passes a glance test, which is exactly why the drift survives review

The dangerous part is not that the guess is ugly. It is that the guess is plausible, unnamed, and different every time.

Slide notes

This slide names the default behaviour so it stops being surprising. Agents are completion systems: when the project does not state a decision, the model supplies the most statistically common answer for that kind of UI. That is where the recurring tells come from — the soft blue accent, the large rounded card, the generous padding, the marketing gradient. None of it is wrong in general; it is just not your product.

The drift mechanics are worth spelling out. A hardcoded near-miss value has no name, so nothing connects it to the system. The next run, or the next teammate's run, picks a slightly different near-miss. Each one passes a quick visual review, because each one is internally consistent and polished. Six weeks later the audit finds eleven reds that mean four different things, and unwinding them is a project. The school's audit article and Module 4 of this course deal with finding that drift; this module is about not creating it.

Also flag the subtler failure: misuse of meaningful colours. If warning orange is the most exciting colour in the palette, an unconstrained agent will use it to add visual interest. Only a token rule — warning is for recoverable risk, never decoration — gives a reviewer or a script something to point at when that happens.

Narration for this slide

What happens when the token is missing? The agent does not stop and ask, and it does not leave the decision blank. It fills the gap with the most common answer in its training data. A plausible red that is close to yours but not yours. A generic card shadow. Arbitrary padding. Framework defaults — large radii, soft blue, roomy spacing. And because each guess is internally consistent and polished, it passes a quick look. That is the trap. The drift is not ugly; it is plausible, unnamed, and different on every run. Six weeks later you have eleven reds meaning four different things, and nobody decided any of them. The fix is upstream: expose the decision before the agent has to invent it.

Slide 7 of 1316:9

The token's journey vs the hardcoded value's journey

The same colour decision, travelling two ways into agent-generated code.

Two-row diagram comparing a token-backed path with a hardcoded path. In the top row, a semantic token named color.status.escalated.foreground is defined in the token source file, built into a generated CSS variable, read by the agent along with its description, and referenced as a CSS variable in the generated component, leading to an outcome card stating the code stays in sync with the system. In the bottom row, no token contract exists, the agent guesses a plausible raw value, the component carries a hardcoded hex value with no link to the system, and the outcome card states the value drifts when the system later changes.
Top: the decision keeps its name from source file to generated component, so a change at the source updates every use and review can verify usage. Bottom: with no token contract the agent guesses, the value lands in the component with no name, and it drifts the moment the system moves on.

The two paths produce nearly identical screenshots on day one. They diverge the first time the system changes.

Slide notes

Walk the top row first. The decision is made once, by a person, in the token source file: the name says what it means, the description says when it applies, and the value is an alias to a primitive. The build turns it into a CSS variable with the same name — generated, never hand-edited. The agent, with the token contract in its context, reads the name and description and writes var(--color-status-escalated-foreground) into the component. The outcome card is the point: one change at the source updates every use, a reviewer or a script can verify the usage mechanically, and themes and dark mode come along for free because the component points at the system rather than at a value.

Then the bottom row. Without a contract, the decision arrives as a phrase in a prompt — use our red for urgent — or not at all. The agent fills the gap with a plausible value, the value lands in the component as a raw hex, and from that moment it has no name, no owner, and no connection to anything. When the system's red changes, this one stays. The audit finds it months later as a finding to fix, multiplied by every unbriefed run in between.

Emphasise the highlight: on day one, screenshots of the two paths look almost identical. That is why screenshot-only review does not catch this class of problem, and why the token rules in the next slide are written as harness rules rather than review reminders.

Narration for this slide

Here is the whole module in one picture. Top row: the decision is made once, in the token source file, with a name and a description. The build turns it into a CSS variable with the same name. The agent reads the name, understands the meaning, and writes the variable into the component. Now the code points at the system — change the source and every use follows, and a reviewer can check the usage mechanically. Bottom row: no contract, so the agent guesses. A plausible red lands in the component as a raw hex with no name and no owner. Day one, the two screenshots look the same. Then the system changes, and only one of these moves with it. That is the difference between a token and a value.

Slide 8 of 1316:9

Token rules in the harness

The rules live where the agent reads them at the start of every run — DESIGN.md and the instruction file — not in each prompt.

DESIGN.md — token use rules (excerpt)
## Token use

Read tokens/semantic.tokens.json before creating or changing UI.

- Use semantic tokens for product meaning:
  color.status.* for states, color.surface.* for surfaces,
  spacing.table.row.compact for queue density.
- Never use primitive palette values or raw hex/rgb in components.
- Never invent new colours, shadows, radii, or spacing values.
- If a needed token is missing, propose it first — name, value,
  description — and wait for approval before using it.
- src/styles/tokens.css is generated. Edit tokens/, never the output.
- Run scripts/verify-token-usage before reporting the work as done.

Rules in the harness apply to every run by default. Rules in the prompt apply only when someone remembers to paste them.

Slide notes

The harness here means the standing material the agent loads at the start of every session: DESIGN.md, the platform's instruction file (CLAUDE.md, AGENTS.md, or equivalent), and the token files those documents point at. The distinction from prompting matters operationally — anything that has to be repeated per task will eventually be forgotten, and the run that forgets it is the run that introduces the drift.

Walk the rules and why each earns its place. Read the token files first: output quality tracks what the agent can see. Semantic over primitive: the meaning is the instruction. Never hardcode and never invent: these are the anti-slop rules — the same pattern brand-spec approaches use, where all generated markup references var(--token) and nothing else. Propose missing tokens before using them: this is the most important behavioural rule, because it converts the agent's gap-filling instinct into a visible design decision a human approves. Source versus generated: prevents silent overwrites. Run the verification script: a small script that flags raw hex, rgb(), arbitrary radius and spacing values gives the agent mechanical feedback before a human spends attention on the work.

Keep DESIGN.md guidance short and pointed at the files; do not paste the whole token set into it. Module 2 covers the full document; Module 3 covers how the review gate uses these same rules as entry criteria.

Narration for this slide

Where do these rules live? Not in your prompts — in the harness: DESIGN.md and the instruction file the agent loads at the start of every run. Here is the shape. Read the token files before touching UI. Use semantic tokens for meaning. Never use raw hex or primitive palette values in components. Never invent new colours, shadows, radii, or spacing. If a token is missing, propose it — name, value, description — and wait for approval, instead of quietly hardcoding. Know which files are source and which are generated. And run the verification script before calling the work done. Rules in the harness apply to every run by default. Rules in a prompt apply only when somebody remembers to paste them.

Slide 9 of 1316:9

Vague guidance vs token-backed instructions

The same intent, written two ways. Only one of them changes what the agent does.

Vague guidanceToken-backed instruction
Use our blue paletteUse color.action.primary only for primary actions
Make the cards consistentPanels use radius.panel.default and no decorative shadow
Use warning colours where neededWarning is recoverable risk; escalated is immediate attention — different tokens
Keep the spacing tightQueue rows use spacing.table.row.compact; eight rows visible at 1440px
Match the design systemRun verify-token-usage and attach desktop and mobile screenshots

Good token instructions name the decision, the place it applies, and the way it will be verified.

Slide notes

Each row pairs a sentence that feels like guidance with one that actually constrains behaviour. The left column is not wrong, it is just unenforceable: the agent can satisfy use our blue palette with any blue in any role, and a reviewer cannot point at a violation because no rule was stated. The right column gives the agent a token name, a scope, and in the strongest rows a verification step.

The warning-versus-escalated row deserves a comment because it is the most common semantic failure in real audits: one orange doing the work of three different meanings. The fix is rarely a new colour — it is usually splitting one token into tokens whose names carry the distinction, and writing one sentence about the boundary.

The last row is the bridge to review. Match the design system is a hope; run the script and attach the screenshots is an instruction whose completion can be checked. That pattern — every rule paired with the evidence that proves it was followed — is the backbone of the review gates in Module 3 and the audits in Module 4.

Narration for this slide

Here is the same intent written two ways. Use our blue palette — versus use color action primary, only for primary actions. Make the cards consistent — versus panels use the panel radius token and no decorative shadow. Use warning colours where needed — versus warning means recoverable risk and escalated means immediate attention, and they are different tokens. The left column feels like guidance, but the agent can satisfy it with anything, and a reviewer can point at nothing. The right column names the decision, the place it applies, and — in the best rows — how it will be verified. That last part matters: an instruction you can check is worth ten you can only hope for.

Slide 10 of 1316:9

Auditing an existing token set for ambiguity

Before adding new tokens, find the ones that mislead. An agent will follow a bad name as faithfully as a good one.

  • Names that describe appearance, not intent: brand-red-dark, blue-light, big-radius
  • One token doing several jobs: the same warning colour for overdue, blocked, and escalated
  • Missing layers: primitives exist but nothing says what they mean in the product
  • No descriptions, or descriptions that restate the value instead of the rule
  • Duplicates and near-misses: three greys within a few percent of each other, none preferred
  • Decisions with no token at all: density, elevation, focus styles living only in people's heads

Every ambiguous name is an instruction the agent will misread politely and consistently.

Slide notes

This audit is small enough to run in an afternoon and it is worth doing before any of the later modules, because everything downstream — DESIGN.md, review gates, drift audits — assumes the token names mean what they say. An agent is a literal reader: a token named brand-red-dark will be used wherever a dark red seems nice, because that is all the name claims.

The overloaded-token check usually produces the most valuable findings. When one colour serves overdue, blocked, and escalated, no instruction can teach the agent the distinction, because the system itself has not made it. The fix is a design decision, not a renaming exercise — which is exactly why this work belongs to the design system lead rather than to whoever maintains the build.

The last bullet is the inverse problem and the easiest to miss: decisions the team makes constantly that have no token at all. Density is the classic one. If compact table rhythm exists only as a habit in two senior designers' heads, every agent run and every new hire re-decides it. Listing those missing tokens is as valuable as fixing the misnamed ones. You can run this audit with the agent itself — ask it to read the token files and report names it finds ambiguous, with reasons — and treat its misreadings as data about where the names fail.

Narration for this slide

Before you add anything, audit what you have, because an agent follows a bad name just as faithfully as a good one. Look for names that describe appearance instead of intent — brand-red-dark tells the agent nothing about when to use it. Look for one token doing several jobs, like a single warning colour covering overdue, blocked, and escalated. Look for missing layers — primitives with no semantic names on top. Check the descriptions: do they state the rule, or just restate the value? Find the near-duplicate greys nobody chose between. And list the decisions that have no token at all — density, elevation, focus styles. A useful trick: ask the agent to read your token files and tell you which names it finds ambiguous. Its misreadings are a map of where your names fail.

Slide 11 of 1316:9

Worked example: one component, before and after

A status chip for a support queue, generated from the same prompt before and after the token cleanup.

Before cleanupAfter cleanup
Tokens availablePrimitive palette only: red-500, orange-400, gray-100color.status.* semantics, component.statusChip.radius, spacing.table.row.compact
Status coloursOne orange for overdue, blocked, and escalatedDistinct semantic tokens; escalated visibly hottest
ImplementationRaw hex values and a rounded-full pillvar(--color-status-*) and the chip radius token
VerificationNothing to check againstverify-token-usage passes; names match usage
Correction costThree review rounds re-explaining the meaningsOne round; one missing token proposed and approved

The prompt did not change. The project changed — and the agent's behaviour followed the project.

Slide notes

This example follows the dashboard scenario from the school's token article. The component is deliberately small — a status chip in a support queue table — because small components show the mechanism clearly. The prompt was held constant across both runs: build the status chip for the queue table, follow the design system.

Before the cleanup, the project exposed only a primitive palette. The agent produced a competent chip: a rounded-full pill, raw hex colours, and one orange carrying every non-normal state. Each review round consisted of re-explaining meanings the project had never written down — escalated is hotter than blocked, chips stay compact in dense tables — and each explanation lived only in that conversation, so the next run would need it again.

After the cleanup, the same prompt hit a project with semantic status tokens, a component token for chip radius, a density token for the queue rows, and the harness rules from earlier in this module. The implementation referenced the variables by name, the verification script passed, and the one review round that remained was a real design conversation: the agent proposed a missing token for a state nobody had specified, and the human decided what it should be. That is the shape of the division of labour this course is building towards — the agent surfaces the gap, the human makes the call, the token records it.

Narration for this slide

Let's trace one component. A status chip for a support queue, same prompt both times: build the chip, follow the design system. Before the cleanup, the project only had a primitive palette. The agent produced a perfectly competent pill — raw hex, one orange doing the work of overdue, blocked, and escalated — and it took three review rounds of re-explaining meanings the project had never written down. After the cleanup — semantic status tokens, a chip radius token, a density token, and the harness rules — the same prompt produced code that referenced the variables by name and passed the verification script. One review round remained, and it was a real one: the agent found a state with no token and proposed one. The prompt never changed. The project did.

Slide 12 of 1316:9

Exercise: rename five tokens to carry intent

Work on your own system. Do not invent a new one — the value is in confronting the names you already ship.

  • Pick five tokens an agent would plausibly misuse: appearance names, overloaded meanings, missing descriptions
  • For each, write the new name using surface, emphasis, state, or density — and a one-sentence description stating the rule and its boundary
  • Note where the rename has to land: source JSON, generated CSS, component code, canvas variables
  • Add one token for a decision that currently has no token at all
  • Optional: ask an agent to review the five new names and report what it would still get wrong

Keep the list. It becomes the token section of your DESIGN.md in Module 2 and the seed of your audit rules in Module 4.

Slide notes

Keep the scope tight: five tokens, not a taxonomy redesign. The learning is in the discipline of writing a name and a single sentence that would survive being read by a literal-minded agent with no other context. Most participants find the sentence harder than the name — stating the boundary (and not for decoration, only in dense tables, never on the page background) is where the actual design decision gets made explicit.

The third bullet is there so the exercise stays honest about cost. A rename touches the source file, the generated output, every component that referenced the old name, and possibly canvas variables in Figma or a connected canvas. That is not a reason to avoid renaming; it is a reason to know the blast radius before proposing it, and it previews the migration and sync work in Module 5. Renaming is also a task agents do well under supervision, since it is mechanical once the new name is decided.

If participants try the optional step, the agent's review of the new names is the most useful feedback in the exercise. Where the agent still hesitates or misapplies a token, the name or the description is still carrying too little. Collect those cases if running this live — they make excellent material for the DESIGN.md drafting session in the next module.

Narration for this slide

Now make it concrete on your own system. Pick five tokens an agent would plausibly misuse — appearance names, one colour doing three jobs, anything without a description. For each one, write a new name that carries intent, using surface, emphasis, state, or density, plus one sentence that states the rule and its boundary. Note everywhere the rename would have to land: source files, generated CSS, components, canvas variables. Then add one token for a decision that currently lives only in someone's head. If you want the extra step, ask an agent to review your five new names and tell you what it would still get wrong — that feedback is gold. Keep the list. You will use it in the next module.

Slide 13 of 1316:9

Summary, and the bridge to DESIGN.md

  • Tokens are named design decisions — the name and description are the instruction, the value is the payload
  • Semantic names change agent behaviour; raw values leave every placement decision to a guess
  • Missing tokens do not stay blank: the agent hardcodes a plausible value and the drift compounds quietly
  • Token rules live in the harness — never hardcode, prefer semantic, propose missing tokens — not in each prompt
  • Audit the names you already have: ambiguous tokens are instructions the agent will misread consistently

Module 2 builds the document that ties this together: DESIGN.md as the contract both the agent and the team read — including the judgment that token files cannot carry.

Slide notes

Recap by connecting the bullets into one argument rather than re-reading them. Tokens are how the system speaks to an agent at the level of individual decisions; semantic naming is what makes that speech precise; missing or vague tokens get answered with plausible guesses that compound into drift; and the rules that prevent the guessing belong in standing harness material, where they apply to every run rather than to the runs someone remembered to brief.

Name the limit honestly: tokens carry decisions, but they cannot carry judgment. A token file cannot say that the dashboard should never be wrapped in decorative cards, that warning colour is being overused as visual interest, or what the product should refuse to do. That judgment needs prose a human can read in ten minutes and an agent can load in every session — which is exactly what DESIGN.md is for, and why it is the next module rather than an afterthought.

If participants did the exercise, ask them to bring the five renamed tokens and the one new token to Module 2. The token section of their DESIGN.md draft starts from that list, and Module 4 turns the same list into audit rules with evidence requirements.

Narration for this slide

Let's close. A token is a named design decision, and for an agent the name and the description are the instruction — the value is just the payload. Semantic names change behaviour; raw values leave every placement to a guess. And missing tokens do not stay blank: the agent fills them with something plausible, and the drift compounds quietly until an audit finds it. The rules that prevent this live in the harness, not in individual prompts, and the first job is auditing the names you already have. But tokens cannot carry judgment — what the product refuses to do, which patterns keep going wrong. That needs a document both the agent and the team actually read. That document is DESIGN.md, and it is where Module 2 picks up. See you there.

Module transcript
Module 1, narrated slide by slide

Slide 1Design Tokens Are Agent Instructions

Welcome to Design Systems for Agents. This first module is about design tokens, but not in the way you have probably discussed them before. We are not going to talk about tokens as a styling convenience or a theming mechanism. We are going to talk about them as instructions — the most concrete way your system tells an agent what is allowed. We will look at what an agent actually does with raw values versus semantic names, what happens when tokens are missing, where the token rules belong in your harness, and how to audit the set you already have. Let's start with why this matters at all.

Slide 2Agents need decisions, not vibes

Here is the failure that motivates this whole module. You ask an agent for a dashboard, and what comes back is polished, plausible, and generic. The temptation is to blame the prompt. But words like calm, dense, or premium give direction — they do not pick the warning colour or the table row height. Every decision you have not stated, the agent fills in with the most common pattern it has seen, which is by definition not your product. Tokens fix the supply problem. A token is a named design decision the agent can read: a name, a type, a value, and a description that says when it applies. The taste is still yours. The token is how it survives the next hundred runs.

Slide 3Raw values vs semantic tokens

Here is the distinction the whole module turns on. A raw value — a hex code, red-600, spacing-4 — tells the agent that a value exists. A semantic token — color dot status dot escalated dot foreground — tells the agent what it means. With raw values, the agent applies red wherever red looks plausible: alerts, accents, decoration. With the semantic name, it applies it where the product means escalated, and nowhere else. The difference compounds. When the system changes, semantic tokens propagate the change; raw values become orphans you have to hunt. And review changes too: a name can be checked against its usage. A raw value is legal everywhere, which means it can be checked nowhere.

Slide 4Naming that carries intent

So what does a name that carries intent look like? It answers the questions the agent would otherwise guess. Surface: where does this sit — page, panel, overlay? Emphasis: how strongly should it speak — primary text, quiet action, danger action? State: what does it signal — escalated, blocked, invalid? Density: how much should fit — compact table rows or a relaxed marketing section? Compare that with names like brand-red-dark or big-padding. Those describe the appearance, which the agent can already see from the value. Name the decision instead, and add a one-sentence description that states the boundary — escalated means immediate attention, never decoration. Agents actually read those sentences. They are the cheapest instructions you will ever write.

Slide 5Where tokens live

Where do these tokens actually live? In most real systems, several places at once. Source files in a JSON token format, where a person edits the decision. Generated CSS variables or a Tailwind theme, which the product consumes at runtime. Component code that maps those variables to props and classes. Canvas variables — Figma variables over MCP, or dot-pen and dot-op files where the variables are readable JSON. And DESIGN.md, which carries the judgment about which tokens to prefer. The instruction your agent needs most is the boundary between source and generated. Edit the token files; never hand-edit the built CSS. Get that wrong and the next build quietly erases the work — or leaves the source and the product disagreeing.

Slide 6What agents do when tokens are missing

What happens when the token is missing? The agent does not stop and ask, and it does not leave the decision blank. It fills the gap with the most common answer in its training data. A plausible red that is close to yours but not yours. A generic card shadow. Arbitrary padding. Framework defaults — large radii, soft blue, roomy spacing. And because each guess is internally consistent and polished, it passes a quick look. That is the trap. The drift is not ugly; it is plausible, unnamed, and different on every run. Six weeks later you have eleven reds meaning four different things, and nobody decided any of them. The fix is upstream: expose the decision before the agent has to invent it.

Slide 7The token's journey vs the hardcoded value's journey

Here is the whole module in one picture. Top row: the decision is made once, in the token source file, with a name and a description. The build turns it into a CSS variable with the same name. The agent reads the name, understands the meaning, and writes the variable into the component. Now the code points at the system — change the source and every use follows, and a reviewer can check the usage mechanically. Bottom row: no contract, so the agent guesses. A plausible red lands in the component as a raw hex with no name and no owner. Day one, the two screenshots look the same. Then the system changes, and only one of these moves with it. That is the difference between a token and a value.

Slide 8Token rules in the harness

Where do these rules live? Not in your prompts — in the harness: DESIGN.md and the instruction file the agent loads at the start of every run. Here is the shape. Read the token files before touching UI. Use semantic tokens for meaning. Never use raw hex or primitive palette values in components. Never invent new colours, shadows, radii, or spacing. If a token is missing, propose it — name, value, description — and wait for approval, instead of quietly hardcoding. Know which files are source and which are generated. And run the verification script before calling the work done. Rules in the harness apply to every run by default. Rules in a prompt apply only when somebody remembers to paste them.

Slide 9Vague guidance vs token-backed instructions

Here is the same intent written two ways. Use our blue palette — versus use color action primary, only for primary actions. Make the cards consistent — versus panels use the panel radius token and no decorative shadow. Use warning colours where needed — versus warning means recoverable risk and escalated means immediate attention, and they are different tokens. The left column feels like guidance, but the agent can satisfy it with anything, and a reviewer can point at nothing. The right column names the decision, the place it applies, and — in the best rows — how it will be verified. That last part matters: an instruction you can check is worth ten you can only hope for.

Slide 10Auditing an existing token set for ambiguity

Before you add anything, audit what you have, because an agent follows a bad name just as faithfully as a good one. Look for names that describe appearance instead of intent — brand-red-dark tells the agent nothing about when to use it. Look for one token doing several jobs, like a single warning colour covering overdue, blocked, and escalated. Look for missing layers — primitives with no semantic names on top. Check the descriptions: do they state the rule, or just restate the value? Find the near-duplicate greys nobody chose between. And list the decisions that have no token at all — density, elevation, focus styles. A useful trick: ask the agent to read your token files and tell you which names it finds ambiguous. Its misreadings are a map of where your names fail.

Slide 11Worked example: one component, before and after

Let's trace one component. A status chip for a support queue, same prompt both times: build the chip, follow the design system. Before the cleanup, the project only had a primitive palette. The agent produced a perfectly competent pill — raw hex, one orange doing the work of overdue, blocked, and escalated — and it took three review rounds of re-explaining meanings the project had never written down. After the cleanup — semantic status tokens, a chip radius token, a density token, and the harness rules — the same prompt produced code that referenced the variables by name and passed the verification script. One review round remained, and it was a real one: the agent found a state with no token and proposed one. The prompt never changed. The project did.

Slide 12Exercise: rename five tokens to carry intent

Now make it concrete on your own system. Pick five tokens an agent would plausibly misuse — appearance names, one colour doing three jobs, anything without a description. For each one, write a new name that carries intent, using surface, emphasis, state, or density, plus one sentence that states the rule and its boundary. Note everywhere the rename would have to land: source files, generated CSS, components, canvas variables. Then add one token for a decision that currently lives only in someone's head. If you want the extra step, ask an agent to review your five new names and tell you what it would still get wrong — that feedback is gold. Keep the list. You will use it in the next module.

Slide 13Summary, and the bridge to DESIGN.md

Let's close. A token is a named design decision, and for an agent the name and the description are the instruction — the value is just the payload. Semantic names change behaviour; raw values leave every placement to a guess. And missing tokens do not stay blank: the agent fills them with something plausible, and the drift compounds quietly until an audit finds it. The rules that prevent this live in the harness, not in individual prompts, and the first job is auditing the names you already have. But tokens cannot carry judgment — what the product refuses to do, which patterns keep going wrong. That needs a document both the agent and the team actually read. That document is DESIGN.md, and it is where Module 2 picks up. See you there.