Slide 1 — Design Tokens Are Agent Instructions
Welcome to Design Systems for Agents. This first module is about design tokens, but not in the way you have probably discussed them before. We are not going to talk about tokens as a styling convenience or a theming mechanism. We are going to talk about them as instructions — the most concrete way your system tells an agent what is allowed. We will look at what an agent actually does with raw values versus semantic names, what happens when tokens are missing, where the token rules belong in your harness, and how to audit the set you already have. Let's start with why this matters at all.
Slide 2 — Agents need decisions, not vibes
Here is the failure that motivates this whole module. You ask an agent for a dashboard, and what comes back is polished, plausible, and generic. The temptation is to blame the prompt. But words like calm, dense, or premium give direction — they do not pick the warning colour or the table row height. Every decision you have not stated, the agent fills in with the most common pattern it has seen, which is by definition not your product. Tokens fix the supply problem. A token is a named design decision the agent can read: a name, a type, a value, and a description that says when it applies. The taste is still yours. The token is how it survives the next hundred runs.
Slide 3 — Raw values vs semantic tokens
Here is the distinction the whole module turns on. A raw value — a hex code, red-600, spacing-4 — tells the agent that a value exists. A semantic token — color dot status dot escalated dot foreground — tells the agent what it means. With raw values, the agent applies red wherever red looks plausible: alerts, accents, decoration. With the semantic name, it applies it where the product means escalated, and nowhere else. The difference compounds. When the system changes, semantic tokens propagate the change; raw values become orphans you have to hunt. And review changes too: a name can be checked against its usage. A raw value is legal everywhere, which means it can be checked nowhere.
Slide 4 — Naming that carries intent
So what does a name that carries intent look like? It answers the questions the agent would otherwise guess. Surface: where does this sit — page, panel, overlay? Emphasis: how strongly should it speak — primary text, quiet action, danger action? State: what does it signal — escalated, blocked, invalid? Density: how much should fit — compact table rows or a relaxed marketing section? Compare that with names like brand-red-dark or big-padding. Those describe the appearance, which the agent can already see from the value. Name the decision instead, and add a one-sentence description that states the boundary — escalated means immediate attention, never decoration. Agents actually read those sentences. They are the cheapest instructions you will ever write.
Slide 5 — Where tokens live
Where do these tokens actually live? In most real systems, several places at once. Source files in a JSON token format, where a person edits the decision. Generated CSS variables or a Tailwind theme, which the product consumes at runtime. Component code that maps those variables to props and classes. Canvas variables — Figma variables over MCP, or dot-pen and dot-op files where the variables are readable JSON. And DESIGN.md, which carries the judgment about which tokens to prefer. The instruction your agent needs most is the boundary between source and generated. Edit the token files; never hand-edit the built CSS. Get that wrong and the next build quietly erases the work — or leaves the source and the product disagreeing.
Slide 6 — What agents do when tokens are missing
What happens when the token is missing? The agent does not stop and ask, and it does not leave the decision blank. It fills the gap with the most common answer in its training data. A plausible red that is close to yours but not yours. A generic card shadow. Arbitrary padding. Framework defaults — large radii, soft blue, roomy spacing. And because each guess is internally consistent and polished, it passes a quick look. That is the trap. The drift is not ugly; it is plausible, unnamed, and different on every run. Six weeks later you have eleven reds meaning four different things, and nobody decided any of them. The fix is upstream: expose the decision before the agent has to invent it.
Slide 7 — The token's journey vs the hardcoded value's journey
Here is the whole module in one picture. Top row: the decision is made once, in the token source file, with a name and a description. The build turns it into a CSS variable with the same name. The agent reads the name, understands the meaning, and writes the variable into the component. Now the code points at the system — change the source and every use follows, and a reviewer can check the usage mechanically. Bottom row: no contract, so the agent guesses. A plausible red lands in the component as a raw hex with no name and no owner. Day one, the two screenshots look the same. Then the system changes, and only one of these moves with it. That is the difference between a token and a value.
Slide 8 — Token rules in the harness
Where do these rules live? Not in your prompts — in the harness: DESIGN.md and the instruction file the agent loads at the start of every run. Here is the shape. Read the token files before touching UI. Use semantic tokens for meaning. Never use raw hex or primitive palette values in components. Never invent new colours, shadows, radii, or spacing. If a token is missing, propose it — name, value, description — and wait for approval, instead of quietly hardcoding. Know which files are source and which are generated. And run the verification script before calling the work done. Rules in the harness apply to every run by default. Rules in a prompt apply only when somebody remembers to paste them.
Slide 9 — Vague guidance vs token-backed instructions
Here is the same intent written two ways. Use our blue palette — versus use color action primary, only for primary actions. Make the cards consistent — versus panels use the panel radius token and no decorative shadow. Use warning colours where needed — versus warning means recoverable risk and escalated means immediate attention, and they are different tokens. The left column feels like guidance, but the agent can satisfy it with anything, and a reviewer can point at nothing. The right column names the decision, the place it applies, and — in the best rows — how it will be verified. That last part matters: an instruction you can check is worth ten you can only hope for.
Slide 10 — Auditing an existing token set for ambiguity
Before you add anything, audit what you have, because an agent follows a bad name just as faithfully as a good one. Look for names that describe appearance instead of intent — brand-red-dark tells the agent nothing about when to use it. Look for one token doing several jobs, like a single warning colour covering overdue, blocked, and escalated. Look for missing layers — primitives with no semantic names on top. Check the descriptions: do they state the rule, or just restate the value? Find the near-duplicate greys nobody chose between. And list the decisions that have no token at all — density, elevation, focus styles. A useful trick: ask the agent to read your token files and tell you which names it finds ambiguous. Its misreadings are a map of where your names fail.
Slide 11 — Worked example: one component, before and after
Let's trace one component. A status chip for a support queue, same prompt both times: build the chip, follow the design system. Before the cleanup, the project only had a primitive palette. The agent produced a perfectly competent pill — raw hex, one orange doing the work of overdue, blocked, and escalated — and it took three review rounds of re-explaining meanings the project had never written down. After the cleanup — semantic status tokens, a chip radius token, a density token, and the harness rules — the same prompt produced code that referenced the variables by name and passed the verification script. One review round remained, and it was a real one: the agent found a state with no token and proposed one. The prompt never changed. The project did.
Slide 12 — Exercise: rename five tokens to carry intent
Now make it concrete on your own system. Pick five tokens an agent would plausibly misuse — appearance names, one colour doing three jobs, anything without a description. For each one, write a new name that carries intent, using surface, emphasis, state, or density, plus one sentence that states the rule and its boundary. Note everywhere the rename would have to land: source files, generated CSS, components, canvas variables. Then add one token for a decision that currently lives only in someone's head. If you want the extra step, ask an agent to review your five new names and tell you what it would still get wrong — that feedback is gold. Keep the list. You will use it in the next module.
Slide 13 — Summary, and the bridge to DESIGN.md
Let's close. A token is a named design decision, and for an agent the name and the description are the instruction — the value is just the payload. Semantic names change behaviour; raw values leave every placement to a guess. And missing tokens do not stay blank: the agent fills them with something plausible, and the drift compounds quietly until an audit finds it. The rules that prevent this live in the harness, not in individual prompts, and the first job is auditing the names you already have. But tokens cannot carry judgment — what the product refuses to do, which patterns keep going wrong. That needs a document both the agent and the team actually read. That document is DESIGN.md, and it is where Module 2 picks up. See you there.