AAgentic Design School
Module 5 of 7
45–55 minutes

Claude Code for Designers

Design Systems and Component Libraries

Using the agent on the system itself: building and documenting components, enforcing tokens, finding drift, and handling the long tail of system maintenance that teams never staff.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Brief component work against an existing system rather than from scratch.
  • Generate and maintain component documentation as part of the same run.
  • Run audits that find token violations and component drift with evidence.
  • Set review gates so system changes stay deliberate.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

The system is the highest-leverage place for an agent

Claude Code for Designers · Module 5 of 7

  • Reading the existing system before changing anything
  • Building new components inside the system's rules
  • Documentation generated alongside the work, not after it
  • Audits, refactors, and the review gates that keep changes deliberate

One-off screens benefit from an agent. The design system multiplies that benefit across every screen built on top of it.

Slide notes

Modules 3 and 4 were about producing artifacts: a converted section, a working prototype. This module turns the agent on the thing those artifacts are made of — the design system itself. The leverage argument is straightforward: a fix to one screen helps one screen, while a fix to a token, a shared component, or its documentation propagates to everything built on top of it.

It is also the place where the failure cost is highest. A bad one-off screen embarrasses one review. A bad change to a shared component or a token file quietly breaks surfaces nobody was looking at. That asymmetry is why this module spends as much time on review gates and evidence as it does on generation.

Set expectations about prerequisites. The module assumes the project has some form of system to work against — token files or CSS variables, a set of shared components, and ideally a CLAUDE.md with conventions in it. If a participant's project has none of these, the exercise at the end still works: an audit is exactly how you find out what the de facto system actually is.

Narration for this slide

Welcome to Module 5. So far you have used Claude Code to produce things — converted sections, working prototypes. This module points the agent at the design system itself: the tokens, the shared components, the documentation, and the maintenance work that never gets staffed. The leverage is obvious once you see it. Fix one screen and you have fixed one screen. Fix a token, a component, or its documentation, and the fix reaches everything built on top of it. The risk scales the same way, which is why this module is as much about review gates and evidence as it is about generation. Let's start with how the agent reads a system.

Slide 2 of 1316:9

Reading the existing system: tokens, components, conventions

System work starts with reading, not generating. The agent should know what already exists before it adds anything.

  • Token files: primitives, semantic tokens, component tokens, and the generated CSS the app actually consumes
  • Shared components: source, props and types, existing variants and states
  • Conventions: naming, file structure, and the rules already encoded in CLAUDE.md or DESIGN.md
  • Real usages across the product — how teams actually combine the pieces

An agent that has not read the system will reinvent it, politely and plausibly, one component at a time.

Slide notes

The single most common failure in agent-assisted system work is skipping this step. Asked to build a card, an agent that has not read the project will produce a perfectly competent generic card — its own colours, its own radius, its own spacing — and the system gains a duplicate instead of a member. The fix is procedural: the first instruction in any system task is to read the token files, the shared component directory, and the conventions, and to restate what it found before building.

Distinguish the layers of the token stack, because they answer different questions. Primitive tokens are the raw values; semantic tokens carry product meaning, like a status colour or a surface level; component tokens carry local rules, like a chip's radius. Generated output — the CSS variables or theme file the application actually consumes — is what the agent should check implementations against. The school's article on design tokens covers this structure in depth; the working summary is that an agent needs the names and meanings, not just the hex values.

CLAUDE.md is where the conventions that cannot be inferred from code belong: the spacing scale, the naming pattern, the rule that components never use raw palette values. Anthropic's guidance, as of mid-2026, is to keep that file under roughly 200 lines and to push large reference material — full brand guidelines, icon inventories — into skills that load on demand. Module 6 picks up the skills half of that split.

Narration for this slide

Every system task starts the same way: the agent reads before it builds. That means the token files — primitives, semantic tokens, component tokens, and the generated CSS the application actually uses. It means the shared components: their source, their props, their existing variants and states. And it means the conventions — naming, structure, the rules you have already written into CLAUDE.md or DESIGN.md. Skip this step and the agent will reinvent your system politely, one plausible generic component at a time. So the first line of any system brief is simple: read what exists, and tell me what you found, before you change anything.

Slide 3 of 1316:9

Decisions the agent can use vs vibes it has to guess

Tokens and conventions only help if they carry meaning, not just values. The difference shows up directly in what the agent builds.

Vague guidanceToken-backed instruction
Use our blue paletteUse color.action.primary only for primary actions
Make cards consistentPanels use radius.panel.default and no decorative shadow
Use warning colours where neededWarning is recoverable risk; escalated means immediate attention
Keep spacing tightQueue rows use spacing.table.row.compact — eight rows visible at 1440px
Match the design systemRun the token check and attach desktop and mobile screenshots

Agents follow explicit constraints well and guess taste poorly. Encode the decision, not the adjective.

Slide notes

This table comes from the school's article on design tokens as agent instructions, and it is the practical heart of the reading step. The left column is how design intent is usually communicated to humans, who fill the gaps with shared context. An agent fills the same gaps with the most statistically common interpretation, which is how you get a soft blue used for every state and a pill radius on a dense table chip.

The right column has three properties worth naming. Each instruction points at a token by name, so the agent can find it and a script can verify it was used. Each explains the meaning behind the value — warning versus escalated is a product decision, not a colour preference. And the last row builds verification into the instruction itself: evidence is requested up front rather than hoped for later.

The place these instructions live matters as much as their wording. Token meanings and the most important rules belong in DESIGN.md or CLAUDE.md so every session gets them; the full token files stay the source of truth that the agent reads directly. Do not paste an entire token export into the instruction file — reference the files and write only the judgment the agent needs when choosing between valid options.

Narration for this slide

Here is the difference between guidance an agent can use and guidance it has to guess. 'Use our blue palette' tells it almost nothing — it will pick a plausible blue and move on. 'Use color.action.primary only for primary actions' names a token, carries a meaning, and can be checked by a script. The same goes for spacing, radius, and status colours: encode the decision, not the adjective. And notice the last row — the best instructions ask for evidence up front: run the token check, attach the screenshots. Agents are good at following explicit constraints and bad at guessing taste. Write your system down in the form they are good at.

Slide 4 of 1316:9

The component library loop

One loop covers building, documenting, verifying, and releasing — with the human gates where system changes become deliberate.

Loop diagram of agent-assisted component library work. The agent reads the existing system — token files, shared components, and conventions in CLAUDE.md and DESIGN.md — then builds the new component and its variants inside those rules on a scratch branch. The work passes a human design review gate before the agent documents the component in the same run, reading props from the type definitions and usage guidance from real usages, and updating the Storybook or MDX page. The agent then verifies with evidence: a token audit, a type check, and screenshots of every state. Release is a human decision, and a dashed feedback line returns drift findings to the token files and CLAUDE.md.
Read the system, build inside its rules, pass the design review gate, document and verify in the same run, and release on a human decision. Drift findings flow back into the tokens and CLAUDE.md so the next pass starts from a better system.

Documentation and verification sit inside the loop, in the same run as the build — not in a later sprint that never arrives.

Slide notes

Walk the loop and name the owner of each step. Reading the system and building inside its rules are agent-run, but the build happens on a scratch branch — shared component code never changes directly on main. The design review gate is human and sits before documentation deliberately: there is no point documenting a component the system designer is about to reject or reshape.

The two steps after the gate are the ones teams habitually skip when humans do this work by hand, which is exactly why they belong to the agent. Documentation is generated in the same run, from the same sources — props from the type definitions, usage guidance from real usages, the Storybook story or MDX page updated alongside the source. Verification produces evidence rather than assurance: the token audit and type check pass, and screenshots of every variant and state are attached so the human review has something concrete to look at.

The dashed feedback line is the part that compounds. When verification finds drift — a missing semantic token, a convention the agent had to guess — the finding goes back into the token files or CLAUDE.md, not just into the conversation. Each pass through the loop should leave the system slightly better documented and slightly harder to drift from. That is the same harness principle from the brief-and-loop modules, applied to the system itself.

Narration for this slide

Here is the loop this module is built around. The agent reads the system first — tokens, components, conventions. It builds the new component inside those rules, on a scratch branch. Then a human gate: the system designer reviews the diff before anything else happens. After approval, the agent documents the component in the same run — props from the types, do and don't guidance from real usages, the Storybook page updated too — and then verifies with evidence: token audit, type check, screenshots of every state. Release is your decision. And the dashed line matters most: drift findings go back into the tokens and CLAUDE.md, so every pass leaves the system stronger.

Slide 5 of 1316:9

Building a new component inside the system's rules

The brief for system work is mostly constraints. The component is the easy part; belonging to the system is the hard part.

  • Name what to extend or compose — and what the new component must not duplicate
  • Point at the tokens it should consume and forbid raw values outright
  • List the states that must exist: hover, focus, disabled, loading, error, empty
  • Require a plan first: anatomy, props, variants, and any token it thinks is missing
  • Keep the work on a scratch branch until it passes review

If the agent thinks the system is missing a token or a variant, it should propose one — not quietly invent a one-off.

Slide notes

A component brief against an existing system reads differently from a mockup-to-code brief. Less of it describes what to build and more of it describes what already exists and must be respected: the components to compose or extend, the semantic tokens to consume, the naming convention, the file location, the states that the system treats as mandatory. The visual description can be short because the system carries most of the visual decisions already.

The plan-first requirement earns its keep on system work specifically. Asking for the proposed anatomy, the prop API, and the variant list before any code exists lets you catch the expensive mistakes — a duplicate of an existing component under a new name, a prop API that fights the rest of the library, a variant explosion nobody asked for — at the point where they cost a sentence to fix.

The highlight line is a rule worth putting verbatim into CLAUDE.md: when the system cannot express something the component genuinely needs, the agent proposes the new token or variant and waits, rather than hard-coding a local value. Most drift enters systems exactly there — a reasonable local exception that never made it back into the system. Making the proposal explicit turns that moment into a design decision instead of an accident.

Narration for this slide

When you brief a new component against an existing system, most of the brief is constraints. Name what it should extend or compose, and what it must not duplicate. Point at the semantic tokens it consumes and forbid raw values outright. List the states the system considers mandatory — hover, focus, disabled, loading, error, empty. Then ask for a plan before any code: the anatomy, the props, the variants, and anything the agent thinks the system is missing. That last part matters. If a token or variant genuinely does not exist, the agent should propose it and wait — not quietly invent a one-off that becomes next quarter's drift.

Slide 6 of 1316:9

Documentation generated alongside the component, not after it

Docs go stale because they are written from memory, later. Generate them from the source, in the same run as the change.

  • Props and variants read from the type definitions, not recalled from memory
  • Do / don't guidance mined from real usages across the product
  • Tokens consumed, accessibility behaviour, and known drift stated explicitly
  • Storybook stories or MDX pages updated in the same run as the source change
  • Sections without evidence marked as gaps, not padded with plausible text

Treat documentation as a build artifact derived from the source — refreshed on demand, reviewed by humans before it ships.

Slide notes

Documentation debt is the most universal design system complaint, and it exists for a structural reason: docs are written by hand, after the fact, from memory, and the system keeps moving. The component gains a prop or loses a variant and the page describing it quietly becomes a description of the component as it was a year ago. The agent changes the economics because the honest sources — the component source, the type definitions, the token references, and the real usages across the product — are all things it can read at scale.

The per-section sourcing is what makes generated docs trustworthy. Props tables come from the types, so they cannot disagree with the code. Usage guidance comes from how teams actually use the component — the patterns they repeat, the props they consistently misuse, the workarounds that signal a missing variant. Accessibility notes come from what the source actually does, with gaps flagged rather than papered over. The school's component documentation generation workflow scales this to a whole library with one agent per component and a consistency pass over the drafts; in this module the unit is one component, in the same run as the change that touched it.

Two cautions. First, write the doc template before generating anything — it is the contract that keeps fifty pages from reading like fifty authors. Second, generated docs that nobody reviews are just stale docs with a newer date. The human read-through before publishing is part of the workflow, not an optional extra.

Narration for this slide

Documentation goes stale for a structural reason: it is written later, by hand, from memory, while the system keeps moving. The agent changes that, because the honest sources are all readable. Props come from the type definitions, so they cannot disagree with the code. Do and don't guidance comes from real usages across the product — how teams actually combine the pieces. Tokens, accessibility behaviour, and known drift get stated explicitly, and anything without evidence is marked as a gap rather than padded with plausible prose. The rule is simple: the documentation updates in the same run as the component, and a human reads it before it ships. Docs become a build artifact, not a backlog item.

Slide 7 of 1316:9

Audits: the four layers of drift, by severity

Most audits fail by inspecting only one layer. Severity keeps the team fixing what protects users before what polishes pixels.

SeverityLayerWhat it looks like
P0MeaningThe same visual pattern communicates different product states
P0AccessibilityContrast, labels, focus, or keyboard behaviour blocks use
P1BehaviourLoading, disabled, error, or responsive states work differently
P1ImplementationA copied local component bypasses the shared one
P2TokensRaw values or one-off variables replace semantic tokens
P3PresentationSpacing, radius, icon size, or label case varies without harming use

Drift that changes meaning or blocks use gets fixed first. Border radius can wait.

Slide notes

Drift rarely starts as a dramatic redesign. It starts as a one-off colour, a copied component, a missing focus state, a density decision that never made it back into the system. An agent is well suited to finding it because the work is mostly patient reading: scanning files for raw values, comparing the same component across routes, checking which states exist where. But without a frame, the output collapses into vague commentary — buttons are inconsistent, spacing needs polish — which is not an audit.

The four layers come from the school's design system audit article and they matter because the same visible inconsistency can have different causes and different owners. Meaning drift — a warning badge that is orange on one page and red on another — teaches users two different languages and is a design decision to resolve. Behaviour drift is usually an engineering fix. Implementation drift, the copied local component, is a consolidation refactor. Presentation drift is polish debt. Severity stops the team spending a day on radius while a warning state is unreadable on mobile.

Scope the audit narrowly: one component family across the routes where it appears, not the whole product. And distinguish intentional variation from accidental drift — a compact table chip legitimately differs from a detail-page chip, but the semantic state, the colour token, and the accessible name should still come from the system. Known exceptions belong in the audit packet so the agent does not try to normalise a useful difference.

Narration for this slide

Audits are where the agent earns its keep on an existing system, but only with a frame. Drift happens in four layers. Meaning drift: the same visual pattern says different things on different pages — that is the most damaging and the least visible in a code search. Accessibility drift blocks people outright. Behaviour drift: states that work differently across surfaces. Implementation drift: a copied local component bypassing the shared one. And presentation drift — spacing, radius, label case — which matters least. Severity exists so the team fixes meaning and access first and treats polish as polish. And scope it tight: one component family across its real usages, not the whole product at once.

Slide 8 of 1316:9

An audit prompt that produces findings, not opinions

The audit asks for evidence and severity, and explicitly forbids fixing anything during the audit itself.

Design-system audit prompt (excerpt)
Audit status chips for design-system drift.

Read first: DESIGN.md, tokens/, the shared StatusChip source,
every route that renders a status chip, and the screenshots
in examples/screenshots/status-chip-audit/.

Check: semantic meaning, token usage, shared-component usage,
accessibility states, responsive behaviour, copy consistency,
and any documented intentional exceptions.

For each finding report: evidence, severity (P0–P3), affected
files or screenshots, likely source of drift, recommended owner,
and whether the fix needs human design review.

Do not rewrite any code during the audit.

A finding names the file, the token, and the consequence. 'Status chips are inconsistent' is a symptom, not a finding.

Slide notes

Three properties of this prompt do most of the work. It names its inputs — the source of truth, the token files, the shared component, the routes, and the screenshots — so the agent is judging against the actual system rather than a general sense of good practice. It asks for findings with evidence, severity, and a recommended owner, which is the difference between an audit and a list of complaints. And it forbids fixing during the audit, because an agent that quietly rewrites the system while auditing it has just created a new round of undocumented change.

Contrast finding quality explicitly. 'Status chips are inconsistent' invites subjective rework. 'The escalated chip on the project detail route uses raw red while the dashboard uses the semantic warning token' names the file, the token, the meaning at stake, and a mechanical fix. The same applies to the supporting checks: a small verification script that flags raw hex values, arbitrary spacing, and one-off shadows in component code gives the agent something objective to report against, and gives the next run immediate feedback before a human spends attention on it.

Screenshots belong in the inputs because plenty of drift is invisible to a code search: mobile wrapping, disabled-state contrast, focus rings, density inside data tables. Capture desktop and mobile, and the states that matter, and ask the agent to describe observable differences before it recommends anything. Findings then become a fix plan — quick fixes, consolidation refactors, token changes, documentation updates, and the conflicts that need a human decision — which is the bridge to the next slide on refactors.

Narration for this slide

Here is the shape of an audit prompt that produces findings rather than opinions. It names its inputs: the design notes, the token files, the shared component, every route that uses it, and screenshots of the real states. It lists what to check — meaning, tokens, shared-component usage, accessibility, responsive behaviour, copy, and known exceptions. It asks for evidence, severity, affected files, a likely cause, and an owner for every finding. And it ends with the most important line: do not rewrite any code during the audit. Audits find drift. Deciding what becomes a system rule is design work, and it happens after, with the evidence in front of you.

Slide 9 of 1316:9

Refactors and migrations the team never had time for

The long tail of system maintenance is exactly the work agents are suited to: mechanical, repetitive, and verifiable.

  • Consolidating copied components back into the shared one, usage by usage
  • Migrating raw values and old variable names to the current token set
  • Backfilling missing states and accessible names across existing components
  • Updating stale stories, examples, and docs after an API change
  • Run in small reviewable passes — never as one heroic big-bang change

The audit produces the work list. The refactor burns it down in passes small enough to review honestly.

Slide notes

Every design system carries a backlog of known, unglamorous work: the three local copies of the badge component, the legacy colour variables still referenced in older routes, the components that never got a loading state, the Storybook pages describing last year's API. Teams do not skip this work because it is hard; they skip it because it is tedious and never urgent. That profile — mechanical, repetitive, verifiable — is precisely what an agent handles well, and the audit findings from the previous step are the natural work list.

The discipline that keeps it safe is pass size. A migration run that touches four hundred files in one diff cannot be reviewed honestly; it gets skimmed and approved, which defeats the purpose of having a gate. Decompose by component family, by route group, or by token category, keep each pass on its own branch, and let the verification evidence — the token check, the type check, before-and-after screenshots — accompany every pass. The agent does not get bored on pass nine, which is the actual advantage over doing this by hand.

Be explicit about what the agent must not decide on its own: deleting a token, renaming a semantic token, or changing what a status colour means are system decisions with product-wide consequences. The agent can propose them, with the usage data to support the proposal, and the system owner decides. That boundary is the subject of the next slide.

Narration for this slide

Every system has a maintenance backlog the team never gets to: copied components that should fold back into the shared one, old colour variables still referenced in legacy routes, components missing their loading or disabled states, docs describing last year's API. This work is mechanical, repetitive, and verifiable — which is exactly the profile an agent handles well, and the audit you just ran produces the work list. The discipline is pass size. Break the migration into small runs you can genuinely review, each on its own branch, each with its own evidence. The agent does not get bored on pass nine. You stay the one who decides what the system means.

Slide 10 of 1316:9

Review gates for system changes: who approves what

System changes ripple. The gates make sure each kind of change is approved by the person accountable for its blast radius.

  • New or renamed semantic tokens — system owner approves; the agent only proposes
  • Shared component API changes — design and engineering review the diff together
  • New variants or states — design review against real product need, not completeness
  • Documentation and stories — a human reads generated docs before they publish
  • Bulk refactors and migrations — approved as a plan first, then reviewed per pass

The agent can prepare every one of these changes. It should not be the one deciding any of them.

Slide notes

Review gates exist because the blast radius of system changes is larger and less visible than the diff suggests. A token rename touches every surface that consumes it; a prop change breaks usages the agent may not have found; a new variant becomes something other teams build against the day it ships. The gate question is not whether to review but who is accountable for each kind of change, and writing that down once — in CLAUDE.md or the system's contributing notes — is what keeps approvals consistent when the volume of agent-prepared changes goes up.

The pattern across every row is the same: the agent prepares, a human decides. For token changes the agent can supply the usage counts and the proposed name; the system owner decides whether the system needs the token at all. For API changes the agent can list every affected usage; design and engineering decide whether the trade-off is worth it. For bulk migrations the plan is approved before any pass runs, and each pass is still reviewed, because plan approval is not a blank cheque.

There is a volume effect worth naming honestly: agents make system changes cheap to produce, which means more of them arrive at the gates. If the gates are vague, they become a bottleneck or get skipped, and either failure undoes the benefit. Tight scopes, evidence attached to every change, and clear ownership per change type are what keep review fast enough to sustain.

Narration for this slide

System changes need clearer gates than one-off screens, because the blast radius is bigger than the diff. So write down who approves what. New or renamed semantic tokens: the system owner decides — the agent only proposes, with usage data attached. Shared component API changes: design and engineering review together, because both inherit the consequences. New variants: reviewed against a real product need, not against completeness. Generated documentation: a human reads it before it publishes. Bulk migrations: the plan is approved first, and every pass is still reviewed. The agent can prepare all of this work. It should not be the one deciding any of it.

Slide 11 of 1316:9

Worked example: one component from request to documented release

A SegmentedControl for a dashboard filter bar, run through the loop against an existing token and component library.

Loop stepWhat actually happened
Read the systemRestated the radius, spacing, and focus tokens and the existing Tabs and Button anatomy
PlanProposed anatomy, props, and states; flagged that no selected-state token existed and proposed one
Design review gateToken proposal accepted, one variant cut as speculative, naming corrected to match the library
Build + documentComponent, stories, and MDX page in one run; props from types, guidance from filter-bar usages
VerifyToken check and type check passed; screenshots of all states; one focus-ring contrast issue flagged
ReleaseFocus ring fixed, system owner merged; token addition recorded in DESIGN.md

The plan surfaced the missing token and the review cut the speculative variant — both before any code existed.

Slide notes

This trace is a representative composite of the runs behind the book and the school's workflows rather than a benchmark, and it is worth saying that to the room. The task was deliberately ordinary: a segmented control for a dashboard filter bar, in a project that already had a token set, a component library, and conventions in CLAUDE.md.

The two most valuable moments both happened before any code existed. The plan surfaced a genuine gap — there was no semantic token for a selected control surface — and proposed one instead of hard-coding a value, which is the behaviour the brief explicitly asked for. The review gate then did its job in both directions: it accepted the token proposal, cut a 'pill' variant the agent had added speculatively, and corrected the component name to match the library's conventions. Each of those corrections cost a sentence at that point and would have cost a rework round after generation.

The verification step is the other half to dwell on. The token and type checks passed, but the screenshot review caught a focus-ring contrast problem on the selected state — a reminder that mechanical checks do not cover everything and that the evidence exists so a human can look at the right things quickly. Total elapsed time was about an hour of designer attention spread across the gates, which is roughly what writing the documentation alone used to take.

Narration for this slide

Let's trace one run. The task: a segmented control for a dashboard filter bar, inside an existing system. The agent read the tokens and the related components first, then planned — and the plan flagged that no selected-state token existed, proposing one instead of inventing a value. At the review gate the proposal was accepted, a speculative variant was cut, and the naming was corrected. Then one run produced the component, the stories, and the documentation page, with props read from the types and guidance from real filter-bar usages. Verification passed the token and type checks and caught a focus-ring contrast issue in the screenshots. Fix, merge, and the new token recorded in DESIGN.md. About an hour of designer attention, spent almost entirely at the gates.

Slide 12 of 1316:9

Exercise: audit one corner of your own system

Pick one component family in a real project and run a scoped audit. Findings only — no fixes in this exercise.

  • Choose one component family used in several places: status chips, buttons, cards, or form fields
  • Assemble the packet: token files, the shared component source, the routes that use it, and screenshots of real states
  • Run the audit prompt from this module, asking for evidence, severity, and a recommended owner per finding
  • Classify each finding as meaning, behaviour, implementation, or presentation drift
  • Pick the one P0 or P1 finding you would fix first, and note which review gate the fix would pass through

Keep the findings file. Module 6 turns this audit into a repeatable skill you can run on a schedule.

Slide notes

This exercise is scoped to stay honest within an hour: one component family, findings only, no fixes. The most common mistake is choosing too broad a surface — 'audit our design system' produces vague output and an unreviewable wall of text. Status chips, buttons, cards, or form fields across three or four routes is the right size; participants without a formal design system should pick whatever pattern is most repeated in their product, because the audit will reveal the de facto system they actually have.

The packet-assembly step is half the learning. Collecting the token files, the shared source, the usage routes, and the screenshots forces participants to discover what their system does and does not make available to an agent — missing token files, undocumented exceptions, components that exist only as copies. Those discoveries are findings in their own right even before the agent runs.

The classification and prioritisation steps keep the output anchored in the four-layer frame from earlier in the module, and the final step — naming the gate the first fix would pass through — connects the audit back to the governance slide. The findings file is deliberately kept: Module 6 uses it as the seed for a SKILL.md, turning this one-off audit into a procedure that runs on a schedule.

Narration for this slide

Time to run this on your own work. Pick one component family in a real project — status chips, buttons, cards, form fields — something used in several places. Assemble the packet: the token files, the shared component source, the routes that use it, and screenshots of the real states, including mobile. Run the audit prompt from this module and ask for evidence, severity, and an owner for every finding — no fixes yet. Then classify each finding by layer, and pick the one P0 or P1 you would fix first, noting which review gate that fix would pass through. Keep the findings file. In Module 6 it becomes a skill you can run on a schedule.

Slide 13 of 1316:9

Summary, and what comes next

  • System work starts with reading: tokens, shared components, and the conventions in CLAUDE.md and DESIGN.md
  • New components are briefed as constraints — semantic tokens, mandatory states, and a plan before code
  • Documentation is generated from the source in the same run, and reviewed by a human before it ships
  • Audits report drift in four layers with evidence and severity; refactors burn the findings down in small passes
  • Review gates name who approves tokens, APIs, variants, docs, and migrations — the agent prepares, humans decide

Module 6 takes the procedures you ran by hand here — audits, documentation, asset production — and turns them into skills that run repeatably.

Slide notes

Recap by walking the loop rather than the bullets in isolation: read the system, build inside its rules, gate the change, document and verify in the same run, release deliberately, and feed what you learned back into the tokens and CLAUDE.md. The compounding effect is the real argument for doing system work this way — every pass leaves the system better specified and harder to drift from, which makes the next pass cheaper.

Be honest about the constraint that runs through the whole module: agents make system changes cheap to produce, and the only thing that keeps that from becoming cheap chaos is the combination of narrow scopes, evidence attached to every change, and named owners at every gate. Teams that adopt the generation without the governance end up with a faster-moving mess.

Preview Module 6 concretely. The audit packet and prompt from the exercise are exactly the kind of procedure that belongs in a SKILL.md: written once, loaded on demand, run on a schedule, with the report and evidence format already agreed. Documentation refreshes and asset production runs follow the same pattern. The shift in the next module is from doing the work well once to making the work repeatable.

Narration for this slide

Let's close the module. System work starts with reading — tokens, components, conventions — because an agent that has not read the system will reinvent it. New components are briefed as constraints, with a plan before code and proposals instead of one-off values. Documentation comes out of the same run, generated from the source and read by a human before it ships. Audits report drift in four layers, with evidence and severity, and refactors burn the findings down in passes small enough to review. And the gates name who approves what: the agent prepares, humans decide. In Module 6 we take these procedures and turn them into skills — repeatable runs you can schedule instead of re-explaining. See you there.

Module transcript
Module 5, narrated slide by slide

Slide 1The system is the highest-leverage place for an agent

Welcome to Module 5. So far you have used Claude Code to produce things — converted sections, working prototypes. This module points the agent at the design system itself: the tokens, the shared components, the documentation, and the maintenance work that never gets staffed. The leverage is obvious once you see it. Fix one screen and you have fixed one screen. Fix a token, a component, or its documentation, and the fix reaches everything built on top of it. The risk scales the same way, which is why this module is as much about review gates and evidence as it is about generation. Let's start with how the agent reads a system.

Slide 2Reading the existing system: tokens, components, conventions

Every system task starts the same way: the agent reads before it builds. That means the token files — primitives, semantic tokens, component tokens, and the generated CSS the application actually uses. It means the shared components: their source, their props, their existing variants and states. And it means the conventions — naming, structure, the rules you have already written into CLAUDE.md or DESIGN.md. Skip this step and the agent will reinvent your system politely, one plausible generic component at a time. So the first line of any system brief is simple: read what exists, and tell me what you found, before you change anything.

Slide 3Decisions the agent can use vs vibes it has to guess

Here is the difference between guidance an agent can use and guidance it has to guess. 'Use our blue palette' tells it almost nothing — it will pick a plausible blue and move on. 'Use color.action.primary only for primary actions' names a token, carries a meaning, and can be checked by a script. The same goes for spacing, radius, and status colours: encode the decision, not the adjective. And notice the last row — the best instructions ask for evidence up front: run the token check, attach the screenshots. Agents are good at following explicit constraints and bad at guessing taste. Write your system down in the form they are good at.

Slide 4The component library loop

Here is the loop this module is built around. The agent reads the system first — tokens, components, conventions. It builds the new component inside those rules, on a scratch branch. Then a human gate: the system designer reviews the diff before anything else happens. After approval, the agent documents the component in the same run — props from the types, do and don't guidance from real usages, the Storybook page updated too — and then verifies with evidence: token audit, type check, screenshots of every state. Release is your decision. And the dashed line matters most: drift findings go back into the tokens and CLAUDE.md, so every pass leaves the system stronger.

Slide 5Building a new component inside the system's rules

When you brief a new component against an existing system, most of the brief is constraints. Name what it should extend or compose, and what it must not duplicate. Point at the semantic tokens it consumes and forbid raw values outright. List the states the system considers mandatory — hover, focus, disabled, loading, error, empty. Then ask for a plan before any code: the anatomy, the props, the variants, and anything the agent thinks the system is missing. That last part matters. If a token or variant genuinely does not exist, the agent should propose it and wait — not quietly invent a one-off that becomes next quarter's drift.

Slide 6Documentation generated alongside the component, not after it

Documentation goes stale for a structural reason: it is written later, by hand, from memory, while the system keeps moving. The agent changes that, because the honest sources are all readable. Props come from the type definitions, so they cannot disagree with the code. Do and don't guidance comes from real usages across the product — how teams actually combine the pieces. Tokens, accessibility behaviour, and known drift get stated explicitly, and anything without evidence is marked as a gap rather than padded with plausible prose. The rule is simple: the documentation updates in the same run as the component, and a human reads it before it ships. Docs become a build artifact, not a backlog item.

Slide 7Audits: the four layers of drift, by severity

Audits are where the agent earns its keep on an existing system, but only with a frame. Drift happens in four layers. Meaning drift: the same visual pattern says different things on different pages — that is the most damaging and the least visible in a code search. Accessibility drift blocks people outright. Behaviour drift: states that work differently across surfaces. Implementation drift: a copied local component bypassing the shared one. And presentation drift — spacing, radius, label case — which matters least. Severity exists so the team fixes meaning and access first and treats polish as polish. And scope it tight: one component family across its real usages, not the whole product at once.

Slide 8An audit prompt that produces findings, not opinions

Here is the shape of an audit prompt that produces findings rather than opinions. It names its inputs: the design notes, the token files, the shared component, every route that uses it, and screenshots of the real states. It lists what to check — meaning, tokens, shared-component usage, accessibility, responsive behaviour, copy, and known exceptions. It asks for evidence, severity, affected files, a likely cause, and an owner for every finding. And it ends with the most important line: do not rewrite any code during the audit. Audits find drift. Deciding what becomes a system rule is design work, and it happens after, with the evidence in front of you.

Slide 9Refactors and migrations the team never had time for

Every system has a maintenance backlog the team never gets to: copied components that should fold back into the shared one, old colour variables still referenced in legacy routes, components missing their loading or disabled states, docs describing last year's API. This work is mechanical, repetitive, and verifiable — which is exactly the profile an agent handles well, and the audit you just ran produces the work list. The discipline is pass size. Break the migration into small runs you can genuinely review, each on its own branch, each with its own evidence. The agent does not get bored on pass nine. You stay the one who decides what the system means.

Slide 10Review gates for system changes: who approves what

System changes need clearer gates than one-off screens, because the blast radius is bigger than the diff. So write down who approves what. New or renamed semantic tokens: the system owner decides — the agent only proposes, with usage data attached. Shared component API changes: design and engineering review together, because both inherit the consequences. New variants: reviewed against a real product need, not against completeness. Generated documentation: a human reads it before it publishes. Bulk migrations: the plan is approved first, and every pass is still reviewed. The agent can prepare all of this work. It should not be the one deciding any of it.

Slide 11Worked example: one component from request to documented release

Let's trace one run. The task: a segmented control for a dashboard filter bar, inside an existing system. The agent read the tokens and the related components first, then planned — and the plan flagged that no selected-state token existed, proposing one instead of inventing a value. At the review gate the proposal was accepted, a speculative variant was cut, and the naming was corrected. Then one run produced the component, the stories, and the documentation page, with props read from the types and guidance from real filter-bar usages. Verification passed the token and type checks and caught a focus-ring contrast issue in the screenshots. Fix, merge, and the new token recorded in DESIGN.md. About an hour of designer attention, spent almost entirely at the gates.

Slide 12Exercise: audit one corner of your own system

Time to run this on your own work. Pick one component family in a real project — status chips, buttons, cards, form fields — something used in several places. Assemble the packet: the token files, the shared component source, the routes that use it, and screenshots of the real states, including mobile. Run the audit prompt from this module and ask for evidence, severity, and an owner for every finding — no fixes yet. Then classify each finding by layer, and pick the one P0 or P1 you would fix first, noting which review gate that fix would pass through. Keep the findings file. In Module 6 it becomes a skill you can run on a schedule.

Slide 13Summary, and what comes next

Let's close the module. System work starts with reading — tokens, components, conventions — because an agent that has not read the system will reinvent it. New components are briefed as constraints, with a plan before code and proposals instead of one-off values. Documentation comes out of the same run, generated from the source and read by a human before it ships. Audits report drift in four layers, with evidence and severity, and refactors burn the findings down in passes small enough to review. And the gates name who approves what: the agent prepares, humans decide. In Module 6 we take these procedures and turn them into skills — repeatable runs you can schedule instead of re-explaining. See you there.