Slide 1 — The system is the highest-leverage place for an agent
Welcome to Module 5. So far you have used Claude Code to produce things — converted sections, working prototypes. This module points the agent at the design system itself: the tokens, the shared components, the documentation, and the maintenance work that never gets staffed. The leverage is obvious once you see it. Fix one screen and you have fixed one screen. Fix a token, a component, or its documentation, and the fix reaches everything built on top of it. The risk scales the same way, which is why this module is as much about review gates and evidence as it is about generation. Let's start with how the agent reads a system.
Slide 2 — Reading the existing system: tokens, components, conventions
Every system task starts the same way: the agent reads before it builds. That means the token files — primitives, semantic tokens, component tokens, and the generated CSS the application actually uses. It means the shared components: their source, their props, their existing variants and states. And it means the conventions — naming, structure, the rules you have already written into CLAUDE.md or DESIGN.md. Skip this step and the agent will reinvent your system politely, one plausible generic component at a time. So the first line of any system brief is simple: read what exists, and tell me what you found, before you change anything.
Slide 3 — Decisions the agent can use vs vibes it has to guess
Here is the difference between guidance an agent can use and guidance it has to guess. 'Use our blue palette' tells it almost nothing — it will pick a plausible blue and move on. 'Use color.action.primary only for primary actions' names a token, carries a meaning, and can be checked by a script. The same goes for spacing, radius, and status colours: encode the decision, not the adjective. And notice the last row — the best instructions ask for evidence up front: run the token check, attach the screenshots. Agents are good at following explicit constraints and bad at guessing taste. Write your system down in the form they are good at.
Slide 4 — The component library loop
Here is the loop this module is built around. The agent reads the system first — tokens, components, conventions. It builds the new component inside those rules, on a scratch branch. Then a human gate: the system designer reviews the diff before anything else happens. After approval, the agent documents the component in the same run — props from the types, do and don't guidance from real usages, the Storybook page updated too — and then verifies with evidence: token audit, type check, screenshots of every state. Release is your decision. And the dashed line matters most: drift findings go back into the tokens and CLAUDE.md, so every pass leaves the system stronger.
Slide 5 — Building a new component inside the system's rules
When you brief a new component against an existing system, most of the brief is constraints. Name what it should extend or compose, and what it must not duplicate. Point at the semantic tokens it consumes and forbid raw values outright. List the states the system considers mandatory — hover, focus, disabled, loading, error, empty. Then ask for a plan before any code: the anatomy, the props, the variants, and anything the agent thinks the system is missing. That last part matters. If a token or variant genuinely does not exist, the agent should propose it and wait — not quietly invent a one-off that becomes next quarter's drift.
Slide 6 — Documentation generated alongside the component, not after it
Documentation goes stale for a structural reason: it is written later, by hand, from memory, while the system keeps moving. The agent changes that, because the honest sources are all readable. Props come from the type definitions, so they cannot disagree with the code. Do and don't guidance comes from real usages across the product — how teams actually combine the pieces. Tokens, accessibility behaviour, and known drift get stated explicitly, and anything without evidence is marked as a gap rather than padded with plausible prose. The rule is simple: the documentation updates in the same run as the component, and a human reads it before it ships. Docs become a build artifact, not a backlog item.
Slide 7 — Audits: the four layers of drift, by severity
Audits are where the agent earns its keep on an existing system, but only with a frame. Drift happens in four layers. Meaning drift: the same visual pattern says different things on different pages — that is the most damaging and the least visible in a code search. Accessibility drift blocks people outright. Behaviour drift: states that work differently across surfaces. Implementation drift: a copied local component bypassing the shared one. And presentation drift — spacing, radius, label case — which matters least. Severity exists so the team fixes meaning and access first and treats polish as polish. And scope it tight: one component family across its real usages, not the whole product at once.
Slide 8 — An audit prompt that produces findings, not opinions
Here is the shape of an audit prompt that produces findings rather than opinions. It names its inputs: the design notes, the token files, the shared component, every route that uses it, and screenshots of the real states. It lists what to check — meaning, tokens, shared-component usage, accessibility, responsive behaviour, copy, and known exceptions. It asks for evidence, severity, affected files, a likely cause, and an owner for every finding. And it ends with the most important line: do not rewrite any code during the audit. Audits find drift. Deciding what becomes a system rule is design work, and it happens after, with the evidence in front of you.
Slide 9 — Refactors and migrations the team never had time for
Every system has a maintenance backlog the team never gets to: copied components that should fold back into the shared one, old colour variables still referenced in legacy routes, components missing their loading or disabled states, docs describing last year's API. This work is mechanical, repetitive, and verifiable — which is exactly the profile an agent handles well, and the audit you just ran produces the work list. The discipline is pass size. Break the migration into small runs you can genuinely review, each on its own branch, each with its own evidence. The agent does not get bored on pass nine. You stay the one who decides what the system means.
Slide 10 — Review gates for system changes: who approves what
System changes need clearer gates than one-off screens, because the blast radius is bigger than the diff. So write down who approves what. New or renamed semantic tokens: the system owner decides — the agent only proposes, with usage data attached. Shared component API changes: design and engineering review together, because both inherit the consequences. New variants: reviewed against a real product need, not against completeness. Generated documentation: a human reads it before it publishes. Bulk migrations: the plan is approved first, and every pass is still reviewed. The agent can prepare all of this work. It should not be the one deciding any of it.
Slide 11 — Worked example: one component from request to documented release
Let's trace one run. The task: a segmented control for a dashboard filter bar, inside an existing system. The agent read the tokens and the related components first, then planned — and the plan flagged that no selected-state token existed, proposing one instead of inventing a value. At the review gate the proposal was accepted, a speculative variant was cut, and the naming was corrected. Then one run produced the component, the stories, and the documentation page, with props read from the types and guidance from real filter-bar usages. Verification passed the token and type checks and caught a focus-ring contrast issue in the screenshots. Fix, merge, and the new token recorded in DESIGN.md. About an hour of designer attention, spent almost entirely at the gates.
Slide 12 — Exercise: audit one corner of your own system
Time to run this on your own work. Pick one component family in a real project — status chips, buttons, cards, form fields — something used in several places. Assemble the packet: the token files, the shared component source, the routes that use it, and screenshots of the real states, including mobile. Run the audit prompt from this module and ask for evidence, severity, and an owner for every finding — no fixes yet. Then classify each finding by layer, and pick the one P0 or P1 you would fix first, noting which review gate that fix would pass through. Keep the findings file. In Module 6 it becomes a skill you can run on a schedule.
Slide 13 — Summary, and what comes next
Let's close the module. System work starts with reading — tokens, components, conventions — because an agent that has not read the system will reinvent it. New components are briefed as constraints, with a plan before code and proposals instead of one-off values. Documentation comes out of the same run, generated from the source and read by a human before it ships. Audits report drift in four layers, with evidence and severity, and refactors burn the findings down in passes small enough to review. And the gates name who approves what: the agent prepares, humans decide. In Module 6 we take these procedures and turn them into skills — repeatable runs you can schedule instead of re-explaining. See you there.