Section 1
Drift is usually quiet
Design-system drift rarely starts as a dramatic redesign. It starts as a one-off color, a copied component, a custom radius, a missing focus state, or a table density decision that never made it back into the system.
The problem is not only visual inconsistency. Drift changes product meaning. A warning badge that is orange in one place and red in another teaches users two different languages. A disabled button that looks like a secondary button creates uncertainty. A copied modal that omits focus management turns a design-system shortcut into an accessibility regression.
Agents can help because they can scan files, compare screenshots, search for raw values, and turn scattered inconsistencies into structured findings. But they need a strong audit frame. Without it, the output becomes vague commentary: buttons are inconsistent, spacing needs polish, colors vary. That is not an audit. An audit needs evidence, severity, ownership, and a decision path.
Section 2
Define the audit surface
Do not ask an agent to audit the whole product at once unless the product is tiny. A useful audit starts with a narrow surface: status chips, dashboard tables, settings forms, navigation, empty states, dialogs, or the onboarding flow.
The audit surface should be small enough that the agent can inspect every relevant file and state, but important enough that fixing it changes real product quality. A single component family used in many places is often a better target than an entire page category.
The best scope names the component family, the routes where it appears, the source of truth, the screenshots to compare, and the allowed output. The agent should find and classify drift. It should not silently redesign the system while it audits.
- Good scope: audit status chips across dashboard, project detail, mobile queue, and settings pages.
- Weak scope: audit our app design and make it consistent.
- Good output: prioritized findings with evidence, affected files, and recommended owner.
- Weak output: broad notes about needing cleaner spacing and more consistent colors.
Design decision
Scope
Design decision
Collect files
Design decision
Collect screenshots
Design decision
Inspect tokens
Design decision
Compare states
Design decision
Prioritize fixes
Design decision
Assign owner
A useful audit moves from scope to evidence, then turns drift into prioritized findings and a fix plan.
Section 3
The four layers of drift
Most design-system audits fail because they only inspect one layer. A file search can find raw hex colors, but it cannot tell whether the UI still communicates the right priority. A screenshot review can see inconsistent cards, but it may miss that two implementations use different components under the hood.
Ask the agent to inspect drift in four layers: meaning, behavior, implementation, and presentation. Meaning drift changes what users understand. Behavior drift changes how the component works. Implementation drift changes whether the system can maintain it. Presentation drift changes polish, rhythm, and visual consistency.
The same visual pattern communicates different product states
Keyboard, focus, loading, disabled, or error states differ
A local copy bypasses shared components or tokens
Spacing, radius, icon size, label case, or density varies
The same visible inconsistency can have different causes and different owners.
Section 4
Case study: status chips drift
A product has status chips for active, blocked, escalated, paused, and completed work. Over time, different pages implement them differently. Some use uppercase labels. Some use pill radius. Some use raw colors. Some omit icons. Some skip disabled and loading states. Mobile compresses the label in one route but wraps it in another.
The agent audit should not simply say the chips are inconsistent. It should identify where the inconsistency lives, whether it is visual, semantic, behavioral, or implementation-level, and what should be fixed first.
The team should also decide which differences are intentional. A compact table chip may need less padding than a detail-page chip, but the semantic state, accessible name, color token, and icon logic should still come from the system. An audit is not a demand that every instance look identical. It is a process for separating intentional variation from accidental drift.
Dashboard status chip
Settings status chip
Mobile status chip
Disabled chip
Escalated chip
Component source
A screenshot board helps the agent compare the same component across product surfaces before writing findings.
Section 5
Build an audit packet
An audit packet gives the agent everything it needs without forcing it to rediscover context. Keep the packet in the repo so future audits can compare against past findings.
The packet should include source-of-truth files, screenshots, route list, component inventory, token references, and any known exceptions. If a designer intentionally approved a variant, write that down. Otherwise the agent may try to normalize a useful difference.
my-product/
├── DESIGN.md
├── AGENTS.md
├── tokens/
│ ├── colors.css
│ ├── spacing.css
│ └── radius.css
├── src/components/
│ ├── status-chip.tsx
│ └── ui/
├── examples/screenshots/
│ └── status-chip-audit/
│ ├── dashboard-desktop.png
│ ├── dashboard-mobile.png
│ ├── settings.png
│ ├── disabled-state.png
│ └── source-notes.md
└── agent-workflows/
└── design-system-audit/
├── scope.md
├── component-inventory.md
├── findings.md
├── fix-plan.md
└── follow-up.mdSection 6
Search code with a design-system lens
Agents are good at repetitive code searches, but the search terms need to match the kinds of drift you expect. Ask for semantic token usage, raw values, copied component names, local utility classes, and state-specific code paths.
For Tailwind projects, raw classes are not automatically wrong. A product may use utilities as the design language. The audit should distinguish semantic token violations from normal composition. A raw `rounded-full` on a badge might be correct if the design system defines pills that way. A raw hex color inside a route is usually more suspicious.
Search for:
- raw colors: "#", "rgb(", "hsl("
- one-off sizing: arbitrary Tailwind values like w-[, h-[, p-[
- duplicated components: StatusChip, Badge, Pill, Tag, StateLabel
- token bypasses: text-red, bg-orange, border-blue outside shared components
- missing states: disabled, loading, error, aria-label, aria-describedby
- local exceptions: comments or props that explain intentional variationSection 7
Compare screenshots for behavior, not just appearance
A screenshot board should compare the same component across routes and states. Desktop-only screenshots are not enough. Many design-system failures appear at the edges: mobile wrapping, disabled state contrast, empty-state copy, focus rings, density inside data tables, or icon alignment when labels change length.
Ask the agent to describe observable differences before it recommends fixes. This keeps the audit anchored in evidence and prevents the model from turning preferences into findings.
- Capture at least one desktop and one mobile viewport for each high-use surface.
- Capture normal, hover or focus, disabled, loading, empty, and error states where relevant.
- Ask for observable differences first, then severity, then recommended fix.
- Separate intentional responsive adaptation from accidental layout damage.
Section 8
Use severity to protect product meaning
A design-system audit should not treat all drift equally. Some drift breaks meaning. Some breaks accessibility. Some only creates polish debt. Severity keeps the team from spending a day tuning border radius while a warning state is unreadable on mobile.
The same state means different things on different pages
Contrast, labels, focus, or keyboard behavior blocks use
Loading, disabled, error, or responsive state works differently
A copied local implementation bypasses the shared component
Raw values or one-off variables replace semantic tokens
Spacing, radius, icon size, or label case differs without harming use
Severity helps the team fix drift in the order that protects users and product meaning.
Section 9
Good vs bad findings
A bad finding names a symptom without evidence. A good finding tells the team what changed, where it appears, why it matters, and what kind of fix is appropriate.
The wording matters because audits often create downstream engineering tickets. A ticket that says make chips consistent invites subjective rework. A ticket that says replace local escalated status chip on `/projects/[id]` with shared `StatusChip` because it uses a conflicting color token is actionable.
Status chips are inconsistent
Escalated chip uses raw red on project detail while dashboard uses semantic warning token
Buttons need polish
Secondary destructive action uses primary button styling in billing settings, increasing risk
Mobile looks off
At 390px, status label wraps before icon, changing scan rhythm in queue table
Audit findings should be specific enough to become fix tickets without losing design context.
Section 10
Audit prompt
The audit prompt should ask for findings, not fixes. Fixing comes after prioritization. The agent should report uncertainty instead of inventing a system rule.
Audit this product surface for design-system drift. Scope: [component family or product surface] Inputs: - DESIGN.md - AGENTS.md - token files - shared component source files - route files where the component appears - screenshots of each usage and state Check: 1. semantic meaning 2. token usage 3. shared component usage 4. accessibility states 5. responsive behavior 6. copy and label consistency 7. intentional exceptions Output: - finding - evidence - severity - affected files or screenshots - likely source of drift - recommended fix owner - whether the fix needs human design review Do not rewrite code during the audit.
Section 11
Turn findings into a fix plan
A design-system audit is useful only if findings become a fix plan. Group findings into quick fixes, component refactors, token changes, documentation updates, and human decisions.
Do not let the agent quietly rewrite the design system during an audit. Audits find drift. Designers decide which changes become system rules. Engineers decide how the shared implementation should change. The agent can prepare the evidence and draft the plan, but the ownership decision belongs to the team.
- Quick fixes: replace raw token, restore missing label, correct copied class.
- Component refactors: consolidate duplicate implementations or add a missing variant.
- Token changes: add semantic tokens only when the current system cannot express a real need.
- Documentation updates: record approved variants and responsive behavior.
- Human decisions: resolve conflicts where the system itself is unclear.
Section 12
Reusable audit workflow
Use this workflow when a component family has spread across the product and the team needs evidence before refactoring. The goal is not to make everything identical. The goal is to make the system intentional again.
1. Pick one component family or product surface. 2. Collect source-of-truth files: DESIGN.md, tokens, shared components. 3. List every route and local component where the pattern appears. 4. Capture screenshots across desktop, mobile, and important states. 5. Ask the agent for observable drift only. 6. Classify each finding by meaning, behavior, implementation, or presentation. 7. Assign severity and owner. 8. Approve the fix plan before implementation. 9. Fix in passes: P0/P1 first, then consolidation, then polish. 10. Recapture screenshots and update the design-system notes.

