AAgentic Design School
Design workflow
Intermediate

Design System Audit at Scale

A workflow that fans out one agent per component or screen to find token drift, spacing inconsistency, duplicated patterns, and undocumented variants, then aggregates everything into a prioritized drift report.

OrchestrationDynamic workflow fan-out

Typical run1–2 hours per audit run

Last reviewed2026-06-02

View the code samples on GitHub

Section 1

Why audit a design system with agents

Design systems drift quietly. A component gets a one-off padding override during a deadline. A team copies a card and adjusts the border radius. A rebrand updates the Figma library but only half of the code. None of these moments feel like a problem on their own, and that is exactly why nobody catches them.

Auditing by hand does not scale. Reading sixty component folders, every screen that consumes them, and every place a hex code or pixel value was typed instead of a token takes weeks of designer and engineer time. Most teams do it once, produce a slide deck, and never repeat it.

Agents change the economics. One agent can read a single component and its usages with full attention. Sixty agents can read sixty components in parallel. The audit becomes something you run quarterly, or after every rebrand, instead of a heroic one-time project.

Section 2

When to reach for this workflow

Reach for this workflow when the system is large enough that nobody holds the whole picture: roughly twenty components or more, multiple consuming surfaces, or more than one team contributing.

It is most valuable right after a rebrand, before a token migration, when a marketing site and a product start to look like cousins instead of siblings, or when a design system team needs evidence to argue for consolidation work.

If the system has fewer than ten components and one team, a focused review session is faster. The fan-out pays off when the reading work, not the judgment work, is the bottleneck.

  • After a rebrand, to measure how much of the new direction actually shipped.
  • Before a token migration, to know what you are migrating from.
  • When multiple teams consume the same system and consistency complaints start to appear.
  • When the design system team needs a prioritized backlog instead of anecdotes.

Section 3

The orchestration pattern: dynamic workflow fan-out

This audit runs as a Claude Code dynamic workflow. Instead of one agent reading the whole codebase in a single context window, Claude writes a small JavaScript orchestration script that runs in the background. The script holds the component list, dispatches one subagent per component, and collects their findings into script variables rather than into Claude's own context.

That separation is the whole point. A sixty-component audit produces far more raw output than fits in one conversation. In a dynamic workflow, the intermediate results live in the script, up to sixteen agents run concurrently, and a single run can dispatch up to a thousand agents. The run is resumable if it is interrupted, so a long audit does not start over from zero.

You trigger a dynamic workflow by including the word workflow in your prompt, or by running with /effort ultracode. Once the orchestration works, save it to .claude/workflows/ so the team can rerun the audit later as a slash command instead of re-describing the whole process.

diagramAudit fan-out structure
1

Design decision

Enumerate components

2

Design decision

Build audit packets

3

Design decision

Fan out agents

4

Design decision

Collect findings

5

Design decision

Deduplicate and rank

6

Design decision

Write drift report

One orchestrating script enumerates components, fans out per-component agents, and merges findings into a single report.

Section 4

Step 1: define the source of truth

An audit without a reference is just a list of opinions. Before any agent runs, write down what the system is supposed to be: the token set, the spacing scale, the typography ramp, the documented variants of each component, and where that documentation lives.

If the source of truth is a Figma library, export the token values or connect a Figma MCP server so agents can read the published styles. If the source of truth is a tokens.json or Style Dictionary build, point the agents at the built output, not the source files, so they compare against what actually ships.

Ambiguity in the reference becomes noise in the findings. If the team has not decided whether 6px spacing is allowed, say so in the brief, and the agents will flag it as a question instead of a violation.

Section 5

Step 2: enumerate the audit units

The fan-out needs a list. For a component library, the list is usually one entry per component folder. For a consuming surface like a marketing site, the list is one entry per page or template. Keep each unit small enough that one agent can read all of it carefully.

The orchestration script builds this list with a glob, not by hand, so the audit stays accurate as the system grows. Each entry carries the paths the agent should read and any component-specific notes, such as known exceptions.

Audit workflow prompt
Run this as a workflow.

Audit our design system for drift against the documented tokens and component specs.

Reference: design-tokens/tokens.json and docs/components/*.md
Scope: every folder under src/components and every page under apps/marketing/pages

For each component or page, dispatch one agent that reports:
- Hard-coded colors, spacing, radii, or font sizes that should be tokens
- Spacing values outside the 4px scale
- Variants that exist in code but not in the docs, and vice versa
- Patterns that duplicate an existing component

Each finding needs file path, line, evidence, severity (P0-P3), and a suggested fix.
Aggregate everything into audit/drift-report.md, grouped by component, with a summary table of counts by severity. Do not change any code.

Section 6

Step 3: the orchestration script

Claude writes the orchestration script for you when you ask for a workflow, but it helps to know what a good one looks like so you can review it. The script enumerates the units, dispatches agents in batches, and keeps every finding in a local variable until the final report is written.

The sketch below uses an agent() pseudo-API to show the shape. Treat it as a sketch, not a library you install: the real script Claude generates will use the workflow runtime available in your version of Claude Code.

Dynamic workflow sketch (orchestration script)
import { readFile, writeFile } from "node:fs/promises"
import { glob } from "glob"

// Sketch: agent(prompt, options) dispatches a subagent and resolves with its report.
const components = await glob("src/components/*/")
const tokens = await readFile("design-tokens/tokens.json", "utf8")

const findings = []
const batchSize = 16

for (let i = 0; i < components.length; i += batchSize) {
  const batch = components.slice(i, i + batchSize)
  const reports = await Promise.all(
    batch.map((dir) =>
      agent(
        "You are auditing one design system component for drift.\n" +
          "Component folder: " + dir + "\n" +
          "Token reference:\n" + tokens + "\n" +
          "Report hard-coded values, off-scale spacing, undocumented variants, and duplicated patterns.\n" +
          "Return JSON: [{ file, line, category, severity, evidence, suggestedFix }]",
        { model: "sonnet" }
      )
    )
  )
  findings.push(...reports.flatMap((r) => JSON.parse(r)))
}

const ranked = findings.sort((a, b) => a.severity.localeCompare(b.severity))
await writeFile("audit/drift-report.md", renderReport(ranked))

Section 7

Step 4: give each subagent a narrow job

The per-component agent should not know or care about the rest of the audit. It reads one folder, compares it against the reference, and returns structured findings. Narrow agents produce findings that are easy to merge; broad agents produce essays that are not.

Defining the auditor as a reusable subagent in .claude/agents/ keeps the behavior consistent across runs and lets you tune the instructions once instead of in every prompt.

.claude/agents/component-auditor.md
---
name: component-auditor
description: Audits a single design system component folder for token drift, off-scale spacing, undocumented variants, and duplicated patterns. Read-only.
tools: Read, Grep, Glob
---

You audit one component folder against the design system reference you are given.

Report only what you can point to in a file. For every finding include:
- file path and line number
- category: token-drift, spacing, undocumented-variant, duplicate-pattern
- severity: P0 (breaks brand or accessibility), P1 (visible inconsistency), P2 (off-spec but subtle), P3 (question for the team)
- evidence: the exact value found and the expected token or scale value
- suggestedFix: one sentence, no code changes

Do not modify any files. Do not comment on code quality unrelated to visual consistency.
Return findings as a JSON array and nothing else.

Section 8

Step 5: aggregate into a drift report

Raw findings from sixty agents are not a report. The aggregation step deduplicates repeated patterns, groups findings by component and category, counts severities, and surfaces the ten issues that explain most of the drift.

The report should open with the summary a design system lead can read in two minutes: total findings, counts by severity, the worst components, and the most common categories. Detail follows below, one section per component, so owners can jump to their own area.

tableDrift report structure
1Summary

Total findings, counts by severity, top drift categories

2Hotspots

The five components or pages with the most P0-P1 findings

3Token drift

Hard-coded values mapped to the token they should use

4Spacing

Values outside the documented scale, grouped by component

5Undocumented variants

Variants in code missing from docs, and documented variants missing from code

6Duplicates

Patterns that re-implement an existing component

The report leads with a summary and keeps per-component detail below it.

Section 9

Case study: a 60-component B2B system after a rebrand

A B2B platform rebranded: new palette, new radius scale, new type ramp. The design system team updated the core tokens and the twelve most-used components, then ran this audit to see what was left.

The fan-out dispatched 64 agents, one per component folder, and finished in about 70 minutes. It returned 312 findings: 41 hard-coded colors from the old palette, 57 spacing values off the new 4px scale, 19 undocumented variants, and 9 components that still imported the legacy radius constants. Eleven of the hard-coded colors were in disabled and hover states that had been skipped during the rebrand sprint.

The team turned the report into a three-sprint cleanup backlog, prioritized by the severity counts. A second audit run after the cleanup showed P0 and P1 findings dropping from 88 to 6, which became the evidence in the rebrand completion review.

Section 10

Case study: a marketing site that drifted from the product

A marketing site had been built by an external agency from the same Figma library as the product, then maintained separately for two years. Visitors moving from the homepage to the app noticed the difference; nobody could say exactly what it was.

The audit ran per page rather than per component: 28 pages, 28 agents, finished in under an hour. Findings included three different blue values used as the primary action color, button padding ranging from 10px to 18px across pages, two competing card patterns, and a heading scale that had quietly gained two extra sizes.

Because the report named exact values and files, the agency could fix the top twenty findings in a single sprint without a redesign discussion. The team now reruns the saved workflow before each quarterly campaign push.

Section 11

Case study: three teams, three forked buttons

A mobile app had one Button component in the design system and, as the audit discovered, two more in feature folders. Each fork had been created to add a loading spinner or an icon slot under deadline pressure, then kept growing.

The per-component agents flagged the duplicates because each fork re-declared the same base styles with small differences: one used a 44px touch target, one 40px, one 48px. The aggregation step grouped them as a single duplicate-pattern finding with all three file paths and a comparison of their props.

The fix was not mechanical, and the workflow did not pretend it was. The report recommended consolidation as a P1 finding and listed the props each fork supported, which gave the three teams a concrete agenda for one consolidation meeting instead of a quarter of intermittent debate.

Section 12

Good vs bad audit findings

The value of the audit lives in the precision of each finding. A vague finding creates a discussion; a precise finding creates a fix. Review a sample of agent findings before trusting the full report, and tighten the subagent instructions if the samples read like opinions.

tableFinding quality comparison
1Bad

Buttons are inconsistent across the app

2Good

P1: Button.tsx line 42 uses #2D6CDF; the primary action token color.action.primary is #2456C4

3Bad

Spacing feels random on the pricing page

4Good

P2: pricing.tsx uses 18px and 22px gaps; the documented scale is 16px and 24px

5Bad

There are too many card components

6Good

P1: PromoCard and OfferTile re-implement Card with different radii (12px vs 16px); consolidate on Card with a variant prop

A useful finding names the file, the value found, the expected value, and the impact.

Section 13

What this audit cannot prove

The audit finds drift; it does not decide what the system should be. If two teams genuinely need different button densities, that is a design decision, and the report should carry it as a question rather than a violation.

Static reading also misses issues that only appear at runtime: computed styles, theme switching, and third-party widgets. Pair this audit with a visual QA pass on key screens if rendered output matters for the decision.

Finally, an agent cannot judge whether a variant is undocumented because the docs are behind or because the variant should not exist. Humans triage that distinction during review of the report.

  • It cannot decide intent: whether an exception is drift or a deliberate divergence.
  • It cannot see runtime-only styling without a rendering step.
  • It cannot prioritize against roadmap pressure; severity is an input to planning, not a plan.
  • It should never change code during the audit run; fixes are a separate, reviewed pass.

Section 14

Save it as a reusable workflow

Once a run produces a report the team trusts, save the orchestration to .claude/workflows/ so the next audit is one command instead of one afternoon of prompting. Rerunning the same workflow after each cleanup pass turns the drift report into a trend line, which is far more persuasive than a single snapshot.

The cleanup itself can borrow the same fan-out shape, but keep it as a separate run with write access and human review on every diff. Auditing and fixing in the same pass makes both harder to trust.

Design system audit workflow
1. Define the source of truth: tokens, scales, and documented variants.
2. Enumerate audit units with a glob: components, pages, or templates.
3. Build one audit packet per unit with paths and the reference.
4. Fan out one read-only auditor agent per unit, up to 16 at a time.
5. Collect structured findings in the orchestration script, not in chat.
6. Deduplicate, group by component, and rank by severity.
7. Write the drift report with a two-minute summary on top.
8. Review with humans, plan fix passes, and rerun the saved workflow to measure progress.
diagramAudit and cleanup loop
Step 1

Define reference

Step 2

Run audit workflow

Step 3

Review drift report

Step 4

Plan fix passes

Step 5

Apply fixes

Step 6

Rerun audit

Step 7

Compare trend

feeds next cycle

The audit is most useful as a repeating loop that measures whether cleanup work actually reduced drift.

Sources

Sources & further reading

Browse the full library on the workflows page or open the code samples in the public repository.

Newsletter

Get the next design system audit checklist and drift report template by email.

The newsletter is the update channel for article revisions, tool changes, and field-tested workflows.

Processed by Buttondown. You can unsubscribe from any email.

Further reading

For deeper reading, see The Agentic Designer and Claude Code for Designers.

The Agentic Designer cover
Curriculum
The Agentic Designer
How AI agents are transforming product design.

The operating model for product designers, design leads, and builders who need to understand what changes when agents join design work.

Claude Code for Designers cover
Curriculum
Claude Code for Designers
A designer's guide to AI-assisted workflows.

A practical guide for designers who want to work directly with coding agents without turning it into a programming manual.