AAgentic Design School
Design workflow
Foundation

Accessibility Sweep

A codebase-wide accessibility audit that combines static checks with agent judgment, ranks findings by severity, and applies fixes in passes that are re-verified before anything is called done.

OrchestrationSubagent fan-out with severity gates

Typical run45–90 minutes per sweep

Last reviewed2026-06-02

View the code samples on GitHub

Section 1

Why accessibility needs a sweep, not a checklist

Most teams do not ship inaccessible interfaces on purpose. They ship them one component at a time: an icon button without a label here, a low-contrast badge there, a modal that traps focus on one page and forgets to on the next. The problems accumulate quietly until an audit, a customer complaint, or a procurement requirement forces the issue.

A sweep treats accessibility as a property of the whole surface, not of individual tickets. It reads every route and component, applies the same checks everywhere, and produces one ranked list instead of scattered annotations.

Agents are well suited to this because the work is mostly careful reading at scale: checking that labels exist and make sense, that focus order matches visual order, that color is not the only signal. Tools catch the mechanical part; agents catch the part that needs judgment about meaning.

Section 2

When to reach for this workflow

Run a sweep before an external audit, before a public sector or enterprise procurement deadline, after a large UI change, or as a recurring health check on flows where failure is expensive, like checkout and onboarding.

It is a Foundation workflow because the structure is simple: check, judge, rank, fix, re-check. You do not need a large design system or a mature token setup to benefit from it.

  • A WCAG AA or AAA target with a date attached.
  • A flow with known complaints: keyboard users getting stuck, screen reader output that makes no sense.
  • A redesign or rebrand that changed colors and components faster than anyone could check them.
  • A recurring monthly or per-release sweep to stop regressions from piling up.

Section 3

Static checks and agent judgment do different jobs

Automated checkers like axe-core are precise about what they can detect: contrast ratios, missing alt attributes, invalid ARIA, form fields without programmatic labels. They are also famously incomplete; common estimates put automated coverage at well under half of WCAG criteria.

Agents fill the other half. An agent can judge whether the accessible name actually describes the control, whether the focus order follows the task order rather than the DOM accident, whether an error message tells the user what to do next, and whether a custom dropdown behaves like the native pattern it imitates.

The sweep uses both, in that order. Static results give the agents concrete anchors and keep them from re-deriving what a tool already proved. Agent review then covers semantics, keyboard paths, and meaning.

tableCheck coverage split
1axe-core

Contrast ratios, missing labels, invalid ARIA, landmark structure

2Playwright

Keyboard walks, focus traps, visible focus indicators per route

3Agent review

Whether names and instructions make sense, focus order vs task order, color-only signals

4Human review

Severity disputes, intentional exceptions, sign-off on the fix plan

Use tools for what they measure well and agents for what needs judgment.

Section 4

The orchestration pattern: subagent fan-out with severity gates

The sweep runs as a Claude Code dynamic workflow. When you include the word workflow in the prompt, or run with /effort ultracode, Claude writes a JavaScript orchestration script that runs in the background. The script holds the route list and the static check results in its own variables, dispatches one subagent per route or component group, and merges what comes back. Intermediate output stays in the script, not in Claude's context, which is what lets a whole-codebase sweep fit in one run.

Up to sixteen subagents run concurrently and a run can dispatch up to a thousand, so even a large app sweeps in under an hour and a half. The run is resumable, and once it works you save it to .claude/workflows/ as a reusable command for the next release.

The severity gate is the human part of the pattern. Findings are ranked, and nothing below the gate gets fixed until everything above it is fixed and re-verified. That keeps the agent from polishing alt text while the checkout button is still unreachable by keyboard.

diagramAccessibility sweep loop
Step 1

Run static checks

Step 2

Fan out route agents

Step 3

Merge and rank findings

Step 4

Severity gate review

Step 5

Fix pass

Step 6

Re-verify

Step 7

Next gate

feeds next cycle

The sweep is a loop: findings are fixed in severity order and every fix pass ends with re-verification.

Section 5

Step 1: capture static evidence first

Start with a script that walks the routes you care about, runs axe-core on each one, and performs a basic keyboard walk: tab through the page, record the focus order, and note any element that receives focus but shows no visible indicator.

This script is infrastructure, not throwaway code. The same script re-runs after every fix pass, so the team can see numbers move instead of trusting that things feel better.

scripts/a11y-scan.mjs (Playwright + axe-core)
import { chromium } from "playwright"
import { AxeBuilder } from "@axe-core/playwright"
import { writeFile } from "node:fs/promises"

const routes = ["/checkout", "/checkout/payment", "/checkout/review", "/dashboard", "/forms/benefit-application"]
const results = []

const browser = await chromium.launch()
const page = await browser.newPage()

for (const route of routes) {
  await page.goto("http://localhost:3000" + route, { waitUntil: "networkidle" })
  const axe = await new AxeBuilder({ page }).withTags(["wcag2a", "wcag2aa"]).analyze()

  const focusOrder = []
  for (let i = 0; i < 40; i++) {
    await page.keyboard.press("Tab")
    focusOrder.push(
      await page.evaluate(() => {
        const el = document.activeElement
        return el ? el.tagName + (el.getAttribute("aria-label") || el.textContent || "").slice(0, 40) : "none"
      })
    )
  }

  results.push({ route, violations: axe.violations, focusOrder })
}

await browser.close()
await writeFile("a11y/scan-results.json", JSON.stringify(results, null, 2))

Section 6

Step 2: the sweep prompt

With the scan results on disk, the workflow prompt asks for the fan-out. Each route agent receives the static results for its route plus the relevant source files, and returns structured findings with severity and a proposed fix.

Accessibility sweep workflow prompt
Run this as a workflow.

Do an accessibility sweep of the app against WCAG 2.2 AA.

Inputs: a11y/scan-results.json (axe-core violations and keyboard focus order per route) and the source under src/.

For each route in the scan results, dispatch one agent that:
- Confirms each axe violation in the source and locates the responsible component
- Reviews focus order against the visual and task order
- Checks accessible names, error messages, and instructions for meaning, not just presence
- Flags color-only status signals and missing focus indicators

Each finding needs: route, file, WCAG criterion, severity (P0 blocks the task or fails AA on a core flow, P1 serious barrier, P2 friction, P3 advisory), evidence, and a concrete fix.

Merge findings into a11y/findings.md ranked by severity. Do not change any code in this run.

Section 7

Step 3: a reusable route reviewer subagent

Defining the reviewer once in .claude/agents/ keeps every route held to the same standard, in this run and the next one. The reviewer is read-only; fixes happen in a later pass under review.

.claude/agents/a11y-reviewer.md
---
name: a11y-reviewer
description: Reviews one route for accessibility against WCAG 2.2 AA, combining axe-core results with judgment about names, focus order, and keyboard paths. Read-only.
tools: Read, Grep, Glob
---

You review one route. You receive the axe-core violations and recorded focus order for that route, plus access to the source.

For every finding report: route, file and line, WCAG criterion, severity (P0-P3), evidence, and a concrete fix a developer could apply without further research.

Separate findings a tool proved (cite the axe rule) from findings based on your judgment (say so explicitly).
Do not report style preferences. Do not modify files. Return findings as a JSON array and nothing else.

Section 8

Step 4: severity gates before any fix

The merged findings list is reviewed by a human before fixing starts. The review is fast because the format is consistent: confirm the P0s are real, demote anything the team has an approved exception for, and approve the first fix pass.

Fixes then happen in severity order, one gate at a time. The P0 pass is applied, the scan script re-runs, and only when the P0 list is empty does the P1 pass begin. This is slower on paper and faster in practice, because it prevents the common failure where a hundred small fixes land at once and three of them break something.

  • P0: blocks task completion or fails AA on a core flow; fix before anything else.
  • P1: serious barrier with a workaround; fix in the same release.
  • P2: friction or inconsistency; schedule into normal work.
  • P3: advisory or best practice; fix opportunistically.
diagramSeverity gate fix loop
Step 1

Review ranked findings

Step 2

Confirm or demote P0s

Step 3

Approve fix pass

Step 4

Apply tier fixes

Step 5

Re-run scan script

Step 6

Tier clear, next gate

feeds next cycle

Each severity tier is fixed and re-verified before the next tier opens.

Section 9

Case study: a checkout flow failing AA contrast

An e-commerce team rebranded to a lighter palette and started failing conversion benchmarks on mobile. The sweep covered the five checkout routes with five agents and finished in about 50 minutes.

It found 23 contrast failures, including the primary Place order button at 2.9:1 against its background and helper text on the payment form at 3.2:1, both below the 4.5:1 AA threshold for normal text. It also found that the discount code error was announced only by turning the field border red.

The P0 pass replaced the off-palette button color with the accessible token variant and added a text error message tied to the field with aria-describedby. The re-run scan showed zero AA contrast violations on checkout, and the fix shipped within the same week as the sweep.

Section 10

Case study: a data-dense dashboard with broken keyboard order

An analytics dashboard had grown by accretion: filter rail, KPI cards, a virtualized table, and a side panel, each added by a different developer. Keyboard users reported getting lost; nobody could reproduce it reliably.

The recorded focus order made the problem visible. Tab order ran header, side panel, KPI cards, then the filter rail, then the table, because the side panel was rendered early in the DOM and pulled forward. The route agent flagged it as P0 against the meaningful sequence and focus order criteria, along with 14 icon-only buttons whose accessible names were the icon file names.

The fix pass reordered the DOM to match the task order, removed two positive tabindex values, and gave the icon buttons real labels. The re-verified focus walk matched the visual order in 38 of 40 tab stops, and the remaining two were documented as intentional.

Section 11

Case study: a public sector form facing an external audit

A government benefits application form had a statutory accessibility audit scheduled in six weeks. The team ran the sweep as a rehearsal across the nine-step form.

The agents found 61 issues: required fields indicated only by an asterisk in the label color, error summaries that did not move focus or link to the failing fields, a date picker unusable without a mouse, and missing fieldset grouping on radio questions. Eleven were ranked P0.

Fixes were applied in three gated passes over two weeks, each followed by a re-run of the scan script and a manual screen reader spot check. The external audit later reported four minor advisories and no blocking failures, and the team kept the saved workflow as their pre-release gate.

Section 12

Good vs bad sweep findings

A sweep is only as useful as its weakest finding. Vague findings get argued with; precise findings get fixed. Spot-check the output before the severity gate review and tighten the reviewer instructions if findings drift toward generalities.

tableFinding quality comparison
1Bad

Contrast is poor in places

2Good

P0: PlaceOrderButton.tsx uses #8FB4FF on white, 2.9:1; needs 4.5:1 (WCAG 1.4.3); use color.action.primary

3Bad

The dashboard is hard to use with a keyboard

4Good

P0: SidePanel renders before FilterRail in Dashboard.tsx, so tab order skips filters (WCAG 2.4.3); reorder and remove tabindex=1

5Bad

Add more ARIA

6Good

P1: Error summary on step 3 lists errors but does not link to fields or receive focus on submit (WCAG 3.3.1)

A fix-ready finding names the criterion, the location, the evidence, and the fix.

Section 13

What a sweep cannot prove

Passing the sweep does not make a product accessible; it makes it free of the failures the sweep can detect. Real assistive technology behavior, especially across screen reader and browser combinations, still needs hands-on testing, and usability for disabled users still needs research with disabled users.

Severity is also a judgment call that belongs to people. An agent can say a finding fails a criterion; whether that blocks a release, given the audience and the deadline, is a decision the team owns.

  • It cannot replace testing with actual screen readers and real users.
  • It cannot certify formal conformance; that requires a documented human evaluation.
  • It cannot judge cognitive load, plain language quality, or content structure beyond surface checks.
  • It cannot decide release trade-offs; the severity gate is where humans decide.

Section 14

The reusable sweep workflow

Save the working orchestration to .claude/workflows/ and run it before each release or audit. The scan script, the reviewer subagent, and the findings format stay stable, so each run is comparable with the last and the trend is visible.

Accessibility sweep workflow
1. List the routes and flows in scope, starting with the ones where failure is most expensive.
2. Run the scan script: axe-core results and keyboard focus order per route.
3. Fan out one read-only reviewer agent per route with its scan results and source files.
4. Merge findings, ranked P0-P3, with criterion, evidence, and a concrete fix each.
5. Hold a severity gate review: confirm P0s, record approved exceptions.
6. Apply fixes one severity tier at a time.
7. Re-run the scan script and re-verify after every fix pass.
8. Save the workflow and rerun it each release to catch regressions early.

Sources

Sources & further reading

Browse the full library on the workflows page or open the code samples in the public repository.

Newsletter

Get the next accessibility sweep checklist and severity rubric by email.

The newsletter is the update channel for article revisions, tool changes, and field-tested workflows.

Processed by Buttondown. You can unsubscribe from any email.

Further reading

For deeper reading, see The Agentic Designer and Claude Code for Designers.

The Agentic Designer cover
Curriculum
The Agentic Designer
How AI agents are transforming product design.

The operating model for product designers, design leads, and builders who need to understand what changes when agents join design work.

Claude Code for Designers cover
Curriculum
Claude Code for Designers
A designer's guide to AI-assisted workflows.

A practical guide for designers who want to work directly with coding agents without turning it into a programming manual.