Slide 1 — Design Review on Every PR
Welcome to the final module of the course. Everything so far — critique dimensions, heuristic evaluations, regression evidence, accessibility and content review — has been a practice you run when you decide to. This module is about the end state: a design review that runs automatically on every pull request that touches the interface. Not a full review of the product on every merge — a narrow, mechanical review of exactly what the branch changed, with screenshot evidence, severity levels, and a clear rule for when a human designer must look. By the end you will be able to draft the review rules for your own codebase. Let's start with why the pull request is the right place.
Slide 2 — Review every change, not every quarter
Here is the problem this module solves. Design quality almost never fails in one big decision — it degrades one merge at a time. A spacing value gets hardcoded because the deadline was tight. A button picks up a colour that is not in the tokens. A new empty state ships without anyone looking at it on a phone. Each diff looks fine to the engineer reviewing it, because engineers review for logic, not hierarchy and token discipline. Catch the issue while the branch is open and it is a thirty-second conversation. Catch it after release and it is a ticket, a triage meeting, and a regression risk. The pull request is where the fix is cheapest. That is where the review belongs.
Slide 3 — What the gate checks — and what it deliberately ignores
So what does a per-PR design review actually check? Four things, all scoped to the screens the branch changed. A visual diff: branch versus main, captured at three widths. Token usage: hardcoded colours, spacing, radii, and font sizes in the changed components. Accessibility: automated checks plus a keyboard pass on the changed routes. And states: any empty, loading, error, or disabled state the diff introduces or alters. Just as important is what it deliberately ignores — copy quality, product logic, performance, and every screen the branch does not touch. That is not laziness. A gate that checks everything blocks everything, and a blocked team routes around the gate. The exclusions are what keep it alive.
Slide 4 — Token, component, and anti-pattern checks as gates
The mechanical checks are where the gate earns its consistency. A token reviewer reads the changed component files and flags raw colours, spacing, radii, and font sizes that have a token equivalent — citing the file, the line, and the token that should replace it. Component rules catch one-off variants and bypassed primitives. Anti-pattern rules encode the things your team has explicitly named and banned. Here is the part that makes it a team practice rather than a personal trick: these reviewers are files in the repository, versioned next to the code they review. Every branch gets the same reviewers with the same instructions, and changing the standard is itself a reviewed change.
Slide 5 — Screenshot evidence attached to the PR automatically
Every finding in the review comes with evidence attached. The run captures the changed screens on the branch and on main, at the same widths, including the states the routes map lists — error, empty, loading, disabled — and attaches the diffs to the findings. That changes the conversation. A comment that says the badge colour looks off-brand starts a debate. A comment that says this file, this line, this hex value, here is the token it should be, and here is the capture — that starts a commit. The standard to hold is simple: the author should be able to act on the comment without opening the design tool, the staging site, or a meeting.
Slide 6 — The PR design review pipeline
Here is the whole pipeline in one picture. A pull request opens and the previews build. The agent scopes the run to the diff using a routes map, then runs the automated checks — tokens, components, anti-patterns, accessibility — and captures the changed screens on both branches as evidence. Everything merges into a review summary posted on the pull request: severity, evidence, and a fix per finding. The designer decision gate sits after that, and most pull requests never need it. When they do, the questions are human ones — intentional deviation or regression, block or approve. The author fixes the branch, the same command re-verifies, and the change merges. And the dashed line matters most: recurring findings become rules in the harness, so the gate gets quieter over time, not louder.
Slide 7 — Severity levels: block, warn, note
Every finding gets one of three severity levels, and three is the right number. Block means the merge stops: broken accessibility on an interactive element, a regression on a primary flow, a brand colour violated on a primary surface. The author fixes it, or a designer explicitly waives it with a reason on the record. Warn means fix it on the branch or file a follow-up with a named owner — token violations off the primary path, a layout shift at one width. Note never blocks anything: it is the gate telling the design-system owner that a token is missing or the routes map has a gap. Severity is not a grade. It is a routing mechanism — it decides who has to look, and when.
Slide 8 — Escalation: when a human designer must look
The point of the gate is that most pull requests do not need a designer at all — the author reads the findings, fixes them, and merges. So the escalation rules need to be unambiguous about the exceptions. A designer must look when a block-level finding is disputed or needs waiving. When the author says a visual change is intentional — because the gate can show the change but cannot judge whether it is an improvement. When the branch adds genuinely new screens or components with no baseline. When the gate itself reports gaps it could not check. And when severity is contested. Make it a named person with a response-time expectation, not a channel where findings go to be ignored. That is the difference between a working gate and a decorative one.
Slide 9 — Avoiding bureaucracy: budgets for noise and false positives
Now the part that decides whether this survives contact with a real team. A design gate fails socially before it fails technically — noise, latency, and pedantry kill it long before a missed regression does. So set a noise budget: if more than about one finding in five is a false positive or a triviality, fix the reviewers before you add anything new. Keep the run inside twenty minutes. Start as a soft requirement on UI-labelled pull requests, not a hard CI failure on day one. Let authors mark intentional changes instead of arguing with a robot. Turn every recurring finding into a rule, so the gate gets quieter over time. And review the gate itself monthly. It is a service to the team, not a tollbooth.
Slide 10 — Measuring whether the practice is improving anything
How do you know the gate is worth its cost? The headline number is escapes — regressions of the kinds the gate checks for that still reach users after merge. That is what the gate exists to reduce, so measure it before and after. Around it, track the catch profile and fix rate, the cost in run minutes and designer minutes, the waiver count, and the trend in recurring findings. That last one is the subtle signal: a healthy gate gets quieter over time, because recurring findings become harness rules and fixed components. If the gate catches the same issues at the same rate forever, it is catching things, but nobody is learning. And keep these as team numbers — never individual scorecards.
Slide 11 — Worked example: six weeks of PR reviews on one team
Let's look at six weeks of this running on a real team: four engineers, one designer, and a previous quarter in which three visual regressions reached customers. They saved the review as a team command, built a routes map covering thirty-one screens, and made the findings comment a soft requirement on UI pull requests. Forty-seven pull requests went through it, about fourteen minutes each. Nine block-level findings — the biggest was a refactor that silently dropped the focus ring from every dialog close button — and forty-one warn-level token violations, mostly fixed the same day. Escapes went from about one a week to one in six weeks. And that one escape was on a screen missing from the routes map — a gap the gate had already reported. Not magic. Just a quieter, better-evidenced default.
Slide 12 — Exercise: draft the PR review rules for your codebase
Time to draft your own gate. One page, for your real repository. First, the checks: which token rules, component rules, and anti-patterns the gate enforces, and the widths it captures. Second, severity: what blocks a merge in your product, what warns, what is just noted — name the actual surfaces. Third, escalation: who looks at disputes and intentional changes, and how quickly. Fourth, the budgets: maximum run time, the false-positive rate you will tolerate, and what triggers a review of the gate itself. Finally, sketch the routes map for one product area. Then take the page to the engineers who will live with it, and have it reviewed like any other design artifact — before the gate exists, not after.
Slide 13 — Summary, and where to go next
Let's close the module, and with it the course. Design quality degrades one merge at a time, so the review moved to the merge: a gate on every pull request that checks the diff — visual changes, tokens, accessibility, states — and deliberately ignores the rest. Three severities route the work, humans decide only what needs deciding, and recurring findings become rules so the gate gets quieter over time. Remember what a clean run means: the branch did not regress what the gate checks. It does not mean the design is good — that judgment stayed with you through all five modules. From here, Design Systems for Agents deepens the rules this gate enforces, and Orchestrating Design Agent Teams scales the reviewers across a portfolio. Thanks for taking the course.