Section 1
Agents need decisions, not vibes
When an agent produces a generic interface, the prompt is often blamed first. Sometimes the prompt is weak. More often, the project has not exposed its design decisions in a form the agent can use.
A designer might say the product should feel calm, dense, premium, operational, or editorial. Those words help with direction, but they do not tell an agent which foreground color to use on a warning badge, how tight a table row should be, what radius belongs on a card, or when a shadow is forbidden.
Design tokens turn those decisions into inspectable project material. They do not replace taste. They make taste easier to apply, test, and repeat.
Section 2
What tokens give the agent
A token is a named design decision. It can describe a color, spacing step, type scale, radius, shadow, breakpoint, motion duration, or component-specific value. The useful part is not only the value. The useful part is the name, type, description, and relationship to other tokens.
The Design Tokens Community Group format exists so design decisions can move between tools with a shared vocabulary. Style Dictionary can transform token input into platform outputs. Tokens Studio helps teams manage token sets and themes. Tailwind CSS v4 exposes theme variables that can make tokens available as CSS variables and utility classes.
For agentic design, the practical point is simple: tokens should be close enough to the codebase that an agent can inspect them before it creates UI, and structured enough that the agent can distinguish primitive values from semantic decisions.
- Primitive tokens answer: what raw values exist?
- Semantic tokens answer: what should this value mean in the product?
- Component tokens answer: what should this component use in a particular state?
- Theme tokens answer: how does the same decision change by brand, mode, or context?
- Generated tokens answer: what does the application actually consume at runtime?
Projects to inspect
Stage 1
Source tokens
Stage 2
Generated CSS variables
Stage 3
Components and screens
Stage 4
Agent visual QA
Review gate
Token names match usage
Review gate
Runtime values match source
Review gate
Screenshots prove the result
The agent should read source tokens and design notes before generating UI, then verify the generated interface against runtime tokens and screenshots.
Section 3
Case study: a dashboard keeps drifting
Imagine a support-operations dashboard. The design system says the interface should be compact and work-focused. The product has a queue table, filters, status chips, risk indicators, and a detail panel. Every agent run produces something plausible, but the details drift.
One version uses a soft blue for every important state. Another adds large rounded cards around each section. Another makes warning states orange but uses the same color for overdue, blocked, and escalated tickets. None of these outputs is obviously broken, but each one weakens the product.
The fix is not to write a longer paragraph asking for better taste. The fix is to expose the actual decisions: status colors, table density, radius scale, type scale, surface hierarchy, and allowed component states.
Warning color reused as decoration
Table row height too loose
Status chip ignores semantic token
Card radius larger than system rule
Mobile order hides active queue
Dark mode contrast not verified
A token-backed review compares the generated interface to the intended hierarchy, states, density, and runtime values.
Section 4
Project file structure for agent-readable tokens
Token files should not be scattered across chat history, Figma comments, and untracked screenshots. Put the source of truth in the project, put generated output where the application consumes it, and put agent instructions beside the files the agent needs to inspect.
The exact folders can change by stack, but the ownership should stay clear. Source tokens are edited deliberately. Generated files are built. Examples teach taste. Review scripts prove that the implementation did not drift.
my-product/ ├── DESIGN.md ├── AGENTS.md ├── CLAUDE.md ├── tokens/ │ ├── README.md │ ├── primitives.tokens.json │ ├── semantic.tokens.json │ ├── component.tokens.json │ └── themes/ │ ├── light.tokens.json │ └── dark.tokens.json ├── src/ │ ├── styles/ │ │ ├── globals.css │ │ └── tokens.css │ └── components/ │ ├── ui/ │ └── dashboard/ ├── examples/ │ ├── screenshots/ │ │ ├── dashboard-good.png │ │ ├── dashboard-drift.png │ │ └── mobile-queue-order.png │ └── token-reviews/ │ ├── good-status-chip.md │ └── bad-status-chip.md ├── agent-workflows/ │ └── token-audit/ │ ├── brief.md │ ├── checklist.md │ └── report-template.md └── scripts/ ├── build-tokens.js ├── verify-token-usage.js └── capture-dashboard-screenshots.js
Section 5
Primitive, semantic, and component tokens
Primitive tokens are raw ingredients. They are useful, but they are not enough for agents. If an agent only sees blue-500, gray-900, and spacing-4, it still has to guess when each value belongs.
Semantic tokens carry product meaning. Component tokens carry local implementation rules. A status chip should not ask the agent to choose from the palette. It should point to a status token with a name that explains the state.
{
"color": {
"primitive": {
"red": {
"600": {
"$type": "color",
"$value": "#dc2626",
"$description": "Base red used only through semantic aliases."
}
}
},
"status": {
"escalated": {
"foreground": {
"$type": "color",
"$value": "{color.primitive.red.600}",
"$description": "Text and icon color for escalated work that requires immediate attention."
}
}
}
},
"component": {
"statusChip": {
"radius": {
"$type": "dimension",
"$value": "{radius.sm}",
"$description": "Status chips stay compact and should not use pill styling in dense tables."
}
}
}
}Section 6
Make DESIGN.md explain the token rules
Token files give agents values. DESIGN.md gives agents judgment. It should explain which tokens are preferred, which are dangerous, and what mistakes keep happening.
Do not paste the whole token file into DESIGN.md. Reference the source files and write the rules an agent needs when it is deciding between valid options.
## Token Use Read tokens/semantic.tokens.json before creating UI. Use semantic tokens for product meaning: - color.status.escalated.foreground for escalated work - color.status.blocked.foreground for blocked work - color.surface.panel for dashboard panels - spacing.table.row.compact for queue density Do not use primitive palette values directly in components unless creating a new semantic token. Dashboard density: - Queue rows should use the compact row rhythm. - Status chips use component.statusChip.radius. - Do not wrap the dashboard in decorative cards. - If a new state needs a color, propose the semantic token first.
Section 7
Good vs bad token instructions
Bad token guidance gives the agent a color palette and asks it to make something consistent. Good token guidance explains meaning, ownership, and review criteria.
This distinction matters because agents are good at following explicit constraints. If the token rule is vague, the agent will still produce a polished result, but it may not belong to the product.
Good: use color.action.primary only for primary actions
Good: dashboard panels use radius.panel.default and no decorative shadow
Good: warning is for recoverable risk; escalated is for immediate attention
Good: queue rows use spacing.table.row.compact and must show eight rows at 1440px
Good: run verify-token-usage and attach desktop and mobile screenshots
Useful token instructions tell the agent what decision the token represents, where it applies, and how to verify it.
Section 8
Prompt with tokens, then review with tokens
A token-aware prompt should not simply say read the design system. It should name the files, state the user job, tell the agent when it can propose a new token, and require evidence after implementation.
The review should use the same token contract. If the implementation uses primitive values directly, creates one-off colors, or ignores component tokens, the agent should report that as design drift.
Use the design token contract before changing UI. Task: tighten the support dashboard queue table. User job: support leads need to scan risky tickets and assign work quickly. Read first: - DESIGN.md - tokens/README.md - tokens/semantic.tokens.json - tokens/component.tokens.json - examples/token-reviews/good-status-chip.md Rules: - Use semantic tokens for status meaning. - Do not use primitive palette values directly in component code. - Do not invent new colors, shadows, or radius values. - If a missing token is required, propose the token before using it. Output: 1. short implementation plan 2. changed files 3. token usage notes 4. desktop and mobile screenshots 5. remaining token risks
Section 9
Generate runtime tokens the agent can inspect
The agent should be able to inspect both source tokens and runtime output. If the application uses CSS variables, generated token files should be readable and predictable. That makes it easier for the agent to check whether component code is using the system instead of one-off values.
Tailwind v4 theme variables, CSS custom properties, or generated files from a token build step can all work. The key is to keep generated output traceable to source tokens and to tell agents which files are source and which files are generated.
/* src/styles/tokens.css - generated from tokens/ */
:root {
--color-surface-panel: #ffffff;
--color-status-escalated-foreground: #dc2626;
--space-table-row-compact: 0.625rem;
--radius-status-chip: 0.25rem;
}
[data-theme="dark"] {
--color-surface-panel: #111827;
--color-status-escalated-foreground: #f87171;
}
.status-chip[data-status="escalated"] {
color: var(--color-status-escalated-foreground);
border-radius: var(--radius-status-chip);
}Section 10
Add a token review gate
A token review gate catches the quiet failures: the interface looks fine, but the agent used raw hex values, duplicated spacing, created a new shadow, or treated all status colors as decoration.
The review should be mechanical where possible and visual where necessary. A script can find raw hex values. A screenshot review can catch hierarchy and density drift. A human designer still decides whether the final interface expresses the product correctly.
Design decision
Read token source
Design decision
Inspect component usage
Design decision
Check generated CSS
Design decision
Capture screenshots
Design decision
Report token drift
Design decision
Approve or revise
A token review checks source files, component usage, generated CSS, screenshots, and the written change report before accepting the work.
Section 11
Case study: token drift in a billing settings page
A billing settings page looked close to the approved design, but the details kept feeling disconnected. The agent had used the right component library, yet the implementation used raw red for destructive actions, a one-off card shadow, arbitrary table padding, and a rounded pill status treatment that did not exist anywhere else in the product.
The design problem was not that the agent lacked taste. It lacked executable token instructions. Once the token contract named destructive actions, compact table density, card elevation, and status-chip radius, the same prompt produced an interface that looked less dramatic and more like the product.
Before: raw destructive red
Before: one-off card shadow
Before: arbitrary table padding
After: semantic danger token
After: card shadow token
After: compact table row token
The useful review is not a beauty contest. It points to concrete token drift: color, radius, spacing, and shadow.
Section 12
Add a small token verification script
A script cannot decide whether a design has good taste, but it can catch the violations that should never reach review: raw colors, arbitrary spacing, duplicated shadows, and local status styles. This gives the agent immediate feedback before a human spends attention on the page.
Keep the script narrow at first. It should report likely drift, not block every legitimate utility class. The goal is to make the agent explain exceptions and fix obvious violations.
const fs = require("node:fs")
const path = require("node:path")
const targets = ["src/app", "src/components"]
const rawValuePatterns = [
/#[0-9a-fA-F]{3,8}/,
/rgb\(/,
/hsl\(/,
/shadow-\[/,
/rounded-\[/,
/p-\[/,
/m-\[/,
]
function walk(dir) {
return fs.readdirSync(dir, { withFileTypes: true }).flatMap((entry) => {
const full = path.join(dir, entry.name)
return entry.isDirectory() ? walk(full) : full
})
}
const findings = []
for (const target of targets) {
for (const file of walk(target).filter((name) => /\.(tsx|ts|css)$/.test(name))) {
const text = fs.readFileSync(file, "utf8")
rawValuePatterns.forEach((pattern) => {
if (pattern.test(text)) findings.push({ file, pattern: String(pattern) })
})
}
}
console.table(findings)
process.exit(findings.length ? 1 : 0)Section 13
Good vs bad token reports
A token report should not say the page uses some custom colors. It should name the semantic mistake, the affected file, the likely token, and whether the fix is mechanical or needs design review.
Colors are inconsistent
Billing danger action uses #dc2626 instead of --color-action-danger-foreground
Spacing is off
Invoice table uses p-4 rows instead of --space-table-row-compact, reducing visible invoices
Cards look different
Payment method card uses shadow-lg while settings cards use --shadow-card-flat
Good token findings are specific enough to fix without losing the design reason.
Section 14
Accessibility and compliance limits
Tokens can support accessibility, but they do not guarantee it. A color token might be named correctly and still fail contrast in a particular component state. A spacing token might be consistent and still make a mobile task order harder to use.
Be careful with claims. Do not let the article, the prompt, or the agent report say a design is accessible unless the relevant checks were actually performed. Prefer precise language: contrast checked for these states, focus styles inspected on these components, mobile reading order reviewed on this viewport.
For public examples, avoid implying legal or accessibility certification. The workflow can flag risks and create review evidence. It is not a substitute for a qualified accessibility or legal review where one is required.
Section 15
Reusable token audit workflow
Use this workflow when an agent-made interface looks polished but slightly off. The goal is to make the agent prove that it used the design system rather than merely imitating it.
The workflow is deliberately boring: read the token contract, inspect implementation, capture screenshots, list drift, fix in passes, and leave a report. That boring loop is what turns tokens from documentation into working agent instructions.
Audit this UI for design-token drift. Inputs: - DESIGN.md - tokens/ - src/styles/tokens.css - target component or route - desktop and mobile screenshots Check: 1. raw colors, spacing, radius, shadows, or typography values in component code 2. semantic token misuse 3. missing component tokens 4. status, focus, hover, disabled, loading, and error states 5. density and responsive behavior against DESIGN.md Output: - pass/fail summary - token drift findings - screenshots reviewed - proposed fixes - tokens that should be added or renamed - risks requiring human design review

