Section 1
Why design QA belongs on the pull request
Design quality usually degrades one merge at a time. A spacing value gets hardcoded because the deadline was tight, a button picks up an off-palette hex, a new empty state ships without anyone capturing it on a phone. Each individual diff looks reasonable to the engineer who reviews it, because engineers review diffs for logic, not for hierarchy and rhythm.
Putting design QA on the pull request moves the review to the moment the change is cheapest to fix and the context is freshest. The branch exists, the preview builds, the author is still thinking about the screen. A finding raised here is a thirty-second conversation; the same finding raised after release is a ticket, a triage meeting, and a regression risk.
The problem has always been cost. Nobody can manually review every UI branch at three widths with a token check and an accessibility pass. An agent can, as long as the procedure is written down once and saved where the whole team can run it. That is what this workflow is: design QA turned into a command.
Section 2
When to reach for this workflow
This is an Advanced workflow because it touches several systems at once — the build, the branch diff, browser capture, token rules, and accessibility checks — and because the value comes from making it a standing gate rather than a one-off review. Set it up once, then let every UI branch pay the ten-to-twenty-minute cost automatically.
It assumes you already have a runnable preview (a dev server or a build command), a way to tell which screens a branch touches, and at least an informal set of design token rules. If you have none of those, start with the accessibility sweep workflow instead and come back to this one when the basics are in place.
- A team merging UI changes weekly or faster, where small regressions accumulate quietly.
- A design system team that wants downstream apps to stay on tokens without manually policing them.
- An agency or consultancy that needs a consistent quality bar on handover branches before a client sees them.
- Any repository where engineers approve UI pull requests without a designer in the loop.
Section 3
What the workflow checks, and what it deliberately skips
The scope is the diff, not the product. The workflow only captures and reviews screens that the branch actually changes, which is what keeps the run inside twenty minutes and keeps the findings relevant to the author. A full-surface review is a different workflow with a different cadence.
On those changed screens it does four things: captures the branch and main at the same widths, compares them, checks the changed components for token discipline, and runs accessibility checks on the same routes. Everything else — copy quality, product logic, performance budgets — is out of scope on purpose, because a gate that checks everything blocks everything.
Branch vs main captures of changed screens at 390, 768, and 1440 pixels
Hardcoded colors, spacing, radii, and font sizes in changed components
axe-core run plus a keyboard pass on the changed routes
Empty, loading, error, and disabled states the diff introduces or alters
Copy, product logic, performance, and screens the branch does not touch
The gate checks what the branch changed and nothing else.
Section 4
The orchestration pattern: a saved workflow command per branch
Under the hood this runs as a Claude Code dynamic workflow. When you include the word workflow in the prompt, or run with /effort ultracode, Claude writes a JavaScript orchestration script and runs it in the background. The script is what holds the moving parts: the list of changed files, the routes mapped from those files, the capture results from both branches, and the merged findings. None of that intermediate output passes through Claude's own context, which is why a branch that touches eight screens still fits in one run.
The script dispatches subagents for the parallel parts — one capture-and-diff agent per changed screen, one token reviewer, one accessibility reviewer — with up to sixteen running concurrently and up to a thousand per run, far more than a single pull request will ever need. Subagent definitions live in .claude/agents/*.md so every branch is reviewed by the same reviewers with the same instructions. The run is resumable, so a flaky preview build does not throw away the captures that already finished.
The part that makes this a team practice rather than a personal trick is saving it. Run the workflow once from a written prompt, open /workflows, and press s on the run to save it into .claude/workflows/ in the repository. From then on it is a slash command — /design-qa — that anyone on the team can run on any branch, and that lives in version control next to the code it reviews. Save it to ~/.claude/workflows/ instead if you want a personal copy that follows you across projects.
Detect changed screens
Build branch and main previews
Capture and diff per screen
Token and a11y checks
Merge findings into PR comment
Author fixes on branch
Re-run /design-qa
feeds next cycleThe gate is a loop: findings go back to the author, the fix lands on the branch, and the same command re-verifies before merge.
Section 5
Step 1: map the diff to screens
The first thing the orchestration script does is ask git which files the branch changed and translate that into routes. The translation is project-specific — file-based routing makes it nearly free, component changes need a small mapping file — but it only has to be written once, and it is the step that keeps the whole run scoped.
Keep a routes map checked into the repository: a JSON file that says which routes render which components. The script reads it, intersects it with the diff, and produces the screen list everything downstream uses. When the map misses a component, the workflow says so in the findings instead of silently skipping it, which is how the map improves over time.
{
"routes": [
{
"route": "/checkout/payment",
"components": ["src/components/checkout/PaymentForm.tsx", "src/components/ui/Button.tsx"],
"states": ["default", "card-error", "loading"]
},
{
"route": "/settings/billing",
"components": ["src/components/settings/BillingPanel.tsx", "src/components/ui/Badge.tsx"],
"states": ["default", "empty", "past-due"]
}
],
"widths": [390, 768, 1440]
}Section 6
Step 2: the workflow prompt
This is the prompt a designer or engineer pastes into Claude Code the first time, before it is saved as a command. It names the inputs, fixes the widths, and ends with the gate: report, do not fix. The wording matters less than the constraints — scoped to the diff, evidence per finding, no code changes in this run.
Run this as a workflow. Do a design QA review of the current branch against main, scoped to the screens this branch changes. Steps: 1. Use git diff main...HEAD --name-only and qa/routes-map.json to list the changed routes and their states. 2. Build or start the preview for this branch and for main (two ports are fine). 3. For each changed route, dispatch one agent that captures both branches at 390, 768, and 1440 px (including the listed states), diffs them, and reports observable differences with the capture paths as evidence. 4. Dispatch one token reviewer over the changed component files: flag hardcoded colors, spacing, radii, and font sizes that have a token equivalent, citing file and line. 5. Dispatch one accessibility reviewer that runs axe-core on the changed routes on the branch preview and does a keyboard pass on any new or changed interactive elements. 6. Merge everything into qa/pr-findings.md: severity P0-P3, evidence, user impact, and a concrete fix per finding, plus a short summary suitable for pasting as a PR comment with gh pr comment. Do not change any code in this run. Report only.
Section 7
Step 3: what the orchestration script looks like
You do not write the orchestration script — Claude writes it when the workflow runs — but it helps to know roughly what it is doing, because that is what you are approving when you save it. The sketch below is illustrative pseudo-code of the shape, not a file you copy into the repository.
The important property is visible in the sketch: the captures, diffs, and per-screen findings live in script variables and on disk, and only the merged summary comes back to the conversation. That is the difference between a workflow and pasting twelve screenshots into chat.
// Sketch only — Claude generates the real script when the workflow runs.
const changed = await sh("git diff main...HEAD --name-only")
const map = JSON.parse(await read("qa/routes-map.json"))
const screens = mapDiffToRoutes(changed, map)
const diffs = await Promise.all(
screens.map((screen) =>
agent("Capture " + screen.route + " on branch and main at 390/768/1440, diff, report observable differences with capture paths.", { model: "sonnet" })
)
)
const tokens = await agent("Review changed components for hardcoded values that should be tokens: " + screens.flatMap(s => s.components).join(", "), { model: "sonnet" })
const a11y = await agent("Run axe-core and a keyboard pass on the branch preview for: " + screens.map(s => s.route).join(", "), { model: "sonnet" })
const report = await agent("Merge these findings into qa/pr-findings.md, ranked P0-P3, with a paste-ready PR comment summary.", { model: "opus", input: { diffs, tokens, a11y } })
return reportSection 8
Step 4: the reviewers live in .claude/agents/
The token reviewer and the screen-diff reviewer are defined once as subagents so every branch gets the same standard. They are read-only by design; the author fixes their own branch.
--- name: token-reviewer description: Reviews changed component files for design token discipline. Flags hardcoded colors, spacing, radii, shadows, and font sizes that have a token equivalent. Read-only. tools: Read, Grep, Glob --- You review only the files you are given. The token source of truth is tokens/ (or the theme file named in the prompt). For every finding report: file and line, the hardcoded value, the token that should replace it, and severity (P1 if it is a brand color or core spacing value on a primary surface, P2 otherwise, P3 if no exact token exists and a new one may be needed). If a value has no reasonable token equivalent, say so instead of inventing a mapping. Do not report style preferences. Do not modify files. Return findings as a JSON array and nothing else.
Section 9
Step 5: save it as /design-qa
After the first successful run, open /workflows in Claude Code, select the run, and press s to save it. Save into .claude/workflows/ inside the repository so the command is versioned and shared: from that point the whole team runs /design-qa on any branch instead of re-pasting the prompt. A personal copy in ~/.claude/workflows/ is useful for consultants who move between client repositories and want the same gate everywhere.
Saving is also where you edit. The saved workflow is a file you can read and adjust — change the widths, add a state, tighten the token rules — and the change applies to everyone's next run. Treat it like any other piece of shared infrastructure: review changes to it, and note in the PR comment which version of the gate produced the findings.
- Run once from the written prompt, with the word workflow in it or via /effort ultracode.
- Open /workflows, select the run, press s, save to .claude/workflows/design-qa.
- Commit the saved workflow and the agents/ definitions so the gate is versioned with the code.
- From then on: check out the branch, run /design-qa, paste the comment with gh pr comment.
# After the run writes qa/pr-findings.md gh pr comment --body-file qa/pr-findings.md # Or attach it to a review instead of a plain comment gh pr review --comment --body-file qa/pr-findings.md
Design decision
Run the written prompt
Design decision
Open /workflows and save
Design decision
Commit to .claude/workflows/
Design decision
Commit reviewer agents
Design decision
Check out a UI branch
Design decision
Run /design-qa
Design decision
Post findings with gh
One successful run from the written prompt becomes a versioned slash command the whole team gates branches with.
Section 10
Case study: a product team gating UI pull requests
A B2B product team of four engineers and one designer adopted the gate after a quarter in which three visual regressions reached customers, including a pricing table that lost its column alignment on tablets. The designer could not review every branch; the engineers did not know what to look for.
They saved /design-qa with their own routes map covering 31 screens and made it a soft requirement: any PR labeled ui needs the findings comment before merge. Over the first six weeks the gate ran on 47 pull requests, averaging 14 minutes per run, and raised 9 P1 findings — the largest being a refactor that silently dropped the focus ring from every dialog close button — plus 41 P2 token violations.
The number they track is escapes: visual regressions found after merge dropped from roughly one a week to one in the entire six-week period, and that one was on a screen missing from the routes map, which is exactly the failure mode the workflow reports rather than hides.
Section 11
Case study: a design system team checking downstream apps
A design system team shipping a minor release of their component library wanted to know what it would do to the three product apps that consume it, before those teams found out the hard way. They reused the same saved workflow with one change: the branch under review was the dependency bump branch the release bot opens in each app.
Across the three apps the gate captured 58 changed screens. Most diffs were the intended ones — the new focus treatment, the adjusted disabled opacity — and the report listed them as expected changes rather than findings. It also caught two real problems: a date picker in the billing app that overflowed its container at 390 pixels because the new minimum width interacted with a local override, and a downstream app that had pinned an internal spacing token the release renamed.
Both issues went into the release notes with the capture paths attached, and the consuming teams fixed them before upgrading. The system team now runs the gate on every release candidate against all downstream apps, which takes a little under an hour for the three of them and has replaced the previous practice of asking in a channel whether anything looks weird.
Section 12
Case study: an agency QA pass on client handover branches
A digital agency hands work to client engineering teams as a branch plus a walkthrough call. After one handover where the client found eleven small inconsistencies in their own QA — and read each one as a sign of carelessness — the agency added /design-qa as the last step before any handover branch is shared.
On the next engagement, a marketing site rebuild with 14 templates, the pre-handover run took 19 minutes and surfaced 2 P1 and 13 P2 findings, mostly hardcoded brand colors in components built early in the project before the token file stabilized, plus one hero image with no alt text on the press template. The team fixed everything above P3 in an afternoon.
The findings file itself became part of the deliverable: the agency now includes the final clean run in the handover packet as evidence the branch was checked, and clients have started asking for the saved workflow so their own teams can keep running the same gate after the engagement ends.
Section 13
Good vs bad findings comments
The findings comment is read by the branch author under time pressure, so its quality decides whether the gate gets respected or routed around. A vague comment generates a debate; a precise comment generates a commit. Spot-check the first few runs and tighten the reviewer instructions if findings drift toward generalities.
The new badge color looks off-brand
P1: Badge.tsx line 24 uses #E4572E, not in tokens; closest token is color.feedback.warning (#D9531E); evidence qa/captures/branch/billing-1440.png
Mobile spacing changed somewhere on checkout
P2: PaymentForm gap dropped from 16 to 8 px below 768 px, fields now visually grouped incorrectly; diff qa/diffs/checkout-payment-390.png
Check accessibility on the new dialog
P1: New ConfirmDialog close button has no accessible name (axe button-name) and is skipped on Tab; add aria-label and remove tabindex=-1
A merge-ready finding names the evidence, the impact, and the fix.
Section 14
What the gate cannot prove
A clean run means the branch did not regress the things the gate checks on the screens the routes map knows about. It does not mean the design is good, the flow makes sense, or the change should ship. Severity disputes, intentional deviations from the system, and the call on whether a P1 blocks the merge all belong to people — usually the designer and the branch author in a short conversation the comment makes faster.
The gate also inherits the limits of its tools: visual diffs are noisy when animations or live data are involved, axe-core covers well under half of WCAG, and the routes map only knows the screens someone added to it. Treat the workflow as a way to make human review cheap and consistent, not as a substitute for it.
- It cannot judge whether a deliberate visual change is an improvement; it can only show the change.
- It cannot prove accessibility; it can only catch the failures automated checks and a keyboard pass detect.
- It cannot see screens missing from the routes map, and it should report the gap rather than stay silent.
- It cannot decide whether to block a merge; the severity gate is where humans decide.
Section 15
The reusable PR gate workflow
Save the workflow, commit the agents, keep the routes map current, and let the command do the repetitive part. The eight steps below are the whole loop; everything else in this article is detail on how to make each step trustworthy.
1. Map the branch diff to changed routes and states using qa/routes-map.json. 2. Build or start previews for the branch and for main. 3. Fan out one capture-and-diff agent per changed screen at 390, 768, and 1440 px. 4. Run the token reviewer over the changed component files. 5. Run the accessibility reviewer (axe-core plus keyboard pass) on the changed routes. 6. Merge findings into qa/pr-findings.md, ranked P0-P3, with evidence and a fix per finding. 7. Post the comment with gh pr comment; the author fixes the branch; humans decide on disputes. 8. Re-run /design-qa on the updated branch and merge when no P0 or P1 remains.
Sources

