Slide 1 — Screenshot to Implementation Parity
Welcome to Module 4. The last two modules were about exploring — turning references into briefs, running several directions at once. This module is about the opposite moment: the design already exists, it has been approved, and the implementation has to match it. The trap here is that agents are very good at producing something that looks roughly right and then declaring success. So the theme of this module is simple: parity is measured, not asserted. We will set up a workflow where the reference, the implementation, and the differences between them sit side by side as evidence, and the loop only ends when a gate passes. Let's start with the setup.
Slide 2 — The parity setup: reference, spec, capture, diff
Here is the setup that stops drift. Four artifacts. The reference image, at full resolution, with the viewport it was designed at written down. A spec the agent extracts from that image before writing any code: regions, grid, type scale, spacing rhythm, colour roles. You review the spec — that is a ten-minute read of a text file, and it is where misinterpretations get caught cheaply. Then a scripted capture of the build, taken the same way every round so comparisons are like for like. And finally the diff: a parity report that names observable mismatches, instead of an overall impression. Spec before the build, gate after it. Those two fixes carry the whole workflow.
Slide 3 — The parity check loop
This is the loop the rest of the module fills in. On the left, the two inputs: the reference with its spec — that is your contract — and a scripted capture of the build, taken the same way every round. Both feed the parity check, where the agent compares them side by side and logs observable mismatches: region, value, severity, and the smallest patch that would fix it. Then the verdict gate. Pass, or pass with accepted differences, goes to your sign-off, and the report stays with the build. Fail goes to a patch pass — one layer per round — a recapture, and another check. The agent runs the comparison and the fixes. You own the spec, the accepted differences, and the signature.
Slide 4 — What agents miss: the predictable failure classes
Agents fail at screenshot builds in predictable ways, and predictable is good — it means you can check for them. Spacing drift: padding relaxes, gaps loosen, and the density that made the design work evaporates. Type drift: sizes that are off the scale, fallback fonts, weights that flatten the hierarchy. State gaps: the screenshot shows one moment of one state, so hover, focus, loading, empty, and error either go missing or get invented. Hierarchy drift: secondary things get louder. System drift: one-off components and made-up values instead of your tokens. And responsive drift: mobile stacks in DOM order instead of task order. None of this is bad luck. It is the pull towards generic, and the spec and the gate are how you resist it.
Slide 5 — Tokens and components are the parity backbone
Here is what keeps parity from turning into pixel worship. Every value in the reference goes through a mapping step, and there are only three outcomes. It maps cleanly to an existing token — that is most rows. It snaps to the nearest step in your scale — a 17-pixel body becomes your 16-pixel token, and that difference shows up later as an accepted difference in the report, visible and signed for. Or it needs a decision — the reference grid is 1280, yours is 1200; there is no matching elevation token. Those go to a human, because they are system questions, not build questions. And this mapping is exactly what makes the workflow safe on competitor-inspired references: the structure and density carry over, the brand does not.
Slide 6 — One gap per round: keeping the loop convergent
Now the discipline that makes the loop converge instead of oscillate. The temptation is to report everything at once — spacing, colours, a missing state, a font issue — and let the agent fix it all. What you get is a build that wobbles: four things fixed, two new things broken, and a different list next round. So: one layer per round. Structure and ordering first, because everything else depends on it. Then spacing rhythm and type. Then colour, borders, and states last. Recapture with the same script after every pass and re-run the check. Most screens converge in two or three passes. If yours does not, stop patching — the problem is almost always in the spec, not the build.
Slide 7 — The parity check prompt
Here is the prompt that runs the gate. A few lines do all the work. Observable mismatches only — the agent has to point at a region and a value, not offer an impression. Each mismatch gets a severity, from P0 task-breaking down to P3 judgment call, and the smallest patch that would fix it — smallest, so a padding issue does not turn into a rebuilt section. Differences that exist because you deliberately mapped to your own tokens get listed as accepted, not buried. And then the two lines that keep it honest: do not patch anything yet, and end with a verdict — pass, pass with accepted differences, or fail with the blocking items. Comparison first, correction after, with you in between.
Slide 8 — Good vs bad parity output
Here is the difference between a report you can act on and one you cannot. The build closely matches the reference — that sentence sounds like progress and tells you nothing you can check, fix, or disagree with. Compare it with: P1, card padding is 24 pixels, the spec says 32, and the consequence is that the price block compresses and the emphasised plan loses its weight. That is checkable, it is assignable to the spacing pass, and you can even disagree with it and accept the difference deliberately — which is a design decision, not an oversight. The gate is only as good as the output you accept through it. The first time you get the weak version, send it back.
Slide 9 — When parity is the wrong goal
One caution before the worked example: parity is not always the right goal. Sometimes the reference has a known flaw — a contrast failure, a missing empty state. Sometimes it predates your design system. Sometimes it is competitor-inspired and parity should only ever cover structure and density, never the brand. Departing from the reference in those cases is good design. The discipline is to do it deliberately: write the improvement into the spec as a decision, let the gate measure against the amended spec, and let the difference appear in the accepted list with a reason. What you must not do is drift away from the reference by accident and find out in a stakeholder review which differences nobody can explain. The gate measures the target. Choosing the target is your job.
Slide 10 — Responsive parity: the breakpoints the reference never showed
Here is the gap that most often survives review: responsiveness. Your reference is one image at one width — usually 1440. Everything below that width is a decision the image cannot show: what stacks, what collapses, what the user must still see first. If the spec does not make those decisions, the agent will, and its default is to stack things in DOM order — the blocks survive, the task priority does not. So the spec gets a short responsive section, and the capture script includes desktop, tablet, and mobile on every round. In one traced run, the gate caught a summary panel dropping below the fold at 1280 only because 1280 was in the script. The rule is blunt: if a width is not captured, it is not checked.
Slide 11 — Worked example: a marketing page to parity in four rounds
Let's trace a real run. An approved 1440-pixel pricing page, an existing component library, a deadline. Round one: the agent extracts the spec, and a ten-minute human review catches that the middle plan's emphasis is a border plus a background shift, not just a badge — one comment that would have been an expensive argument later. Round two: the build passes structure but the gate fails it with nine mismatches — 18-pixel body against the 16-pixel token, card padding compressed, a generic accordion instead of the system one. Round three fixes spacing and type and swaps the component. Round four: pass with accepted differences — one shadow mapped to the nearest elevation token. About two hours, and the review was three named decisions, not a thread of adjectives.
Slide 12 — Exercise: run one parity round on an existing implementation
Your turn. Take a screen that has already been built from a design — something with a real reference, a Figma frame or the old page it replaced. Write a minimal spec from that reference: regions, type scale, spacing rhythm, colour roles, with measurements where you can get them. Capture the implementation at the reference width and at one mobile width. Run the parity check prompt, and insist on severities, values, and a verdict. Then do the part that matters: sort the findings into real mismatches, acceptable token-mapping differences, and gaps that exist because the spec never decided. One round is enough. Keep the report — Module 5 turns this into a repeatable sweep.
Slide 13 — Summary, and the bridge to visual QA loops
Let's close the module. Parity is measured, not asserted: the reference, a spec extracted from it, scripted captures of the build, and a diff with severities and a verdict. The spec review before the build is the cheapest correction you will make; the gate after it is the only honest exit. The drift is predictable — spacing, type, states, hierarchy, system, responsive — so the check looks for exactly that. Reference values map to your tokens, and departures are recorded decisions, not accidents. And the loop converges because each round fixes one layer. The limits are real too: the gate proves observable parity at captured widths, nothing more. In Module 5 we widen the lens — from one screen at build time to a visual QA matrix that sweeps routes, states, and breakpoints on every change. See you there.