Agentic Design School

Section 01

Why screenshot builds drift

Handing an agent a screenshot and asking it to build the page usually produces something that looks roughly right and reads completely wrong up close. The layout regions are there, but the type scale flattens, the spacing rhythm loosens, and the colors become approximations of the brand instead of the brand.

The drift happens because the agent builds from an impression of the image rather than a specification of it. Nothing in the loop forces the build to be measured against the reference before the agent declares success.

This workflow adds two structural fixes: a spec extraction step before any code is written, and a parity gate after the build that compares captured screenshots of the implementation back to the reference and reports observable mismatches. The agent iterates until the gate passes or escalates what it cannot resolve.

Section 02

When to reach for it

Use this when a static image is the clearest available statement of design intent: an approved Figma frame, an exported mockup, a concept screenshot, or a slide that someone has already signed off on.

It is not the right workflow when the design only exists as a vague idea, or when the reference is another company's product that you intend to copy wholesale rather than translate into your own system.

An approved Figma frame needs to become a production page.
A concept screenshot needs to be implemented with the team's own tokens and components.
A mockup made for a deck needs to become a real, responsive screen.
A page rebuilt during a migration must keep visual parity with the old one.

Section 03

The orchestration pattern: a single-agent loop with a gate

This workflow stays with one agent in one session, because the steps are sequential and each one depends on the output of the last. There is nothing to parallelize: the spec informs the build, the build produces the captures, the captures feed the parity check, and the parity check decides whether to loop.

If the same parity loop later needs to run across many screens at once, lift it into a dynamic workflow: a JavaScript script Claude writes that orchestrates one build-loop subagent per screen in the background, keeps intermediate spec and parity results in script variables instead of the main context, and runs up to 16 agents concurrently with as many as 1,000 per run. Workflows are resumable, can be saved to .claude/workflows/ as a reusable slash command, and are triggered by including the word workflow in the prompt or by running /effort ultracode.

For a single screen, the discipline that matters is the gate, not the orchestration. The agent must not be allowed to claim parity without showing the comparison.

diagramParity build loop

Step 1

Collect reference

Step 2

Extract spec

Step 3

Map to tokens

Step 4

Build

Step 5

Capture build

Step 6

Parity check

Step 7

Patch or sign off

feeds next cycle

The loop only exits through the parity gate or through an explicit human decision.

Section 04

Step 1: collect the reference and the constraints

Gather the reference image at full resolution, the viewport it represents, and any constraints the image cannot show: the component library, the token set, accessibility rules, content that must be real rather than placeholder, and how the screen should behave at widths the reference does not cover.

Write the constraints down before the build starts. Most parity arguments later turn out to be missing constraints, not agent mistakes.

Reference image and the viewport width it was designed at.
Component library and design tokens the build must use.
Real content, states, and data the screenshot fakes or omits.
Responsive expectations for widths outside the reference.
Accessibility requirements that pixels alone cannot show.

Section 05

Step 2: extract a spec from the reference

Before any code, the agent describes the reference as a specification: layout regions, the grid, the type scale, the spacing rhythm, the color roles, and the components it can identify. The spec is reviewed by a human before the build, which is far cheaper than reviewing a finished page.

Force measurements where they can be measured and ranges where they cannot. A spec that says generous padding will drift; a spec that says 24px vertical rhythm with 32px section gaps will not.

parity-spec-template.md

# Parity spec: <screen name>

## Reference
- Source: <figma frame link or image path>
- Designed at: 1440px wide

## Layout regions
- Header: full width, 72px tall, logo left, nav right
- Hero: two columns 7/5, 96px top padding
- Pricing grid: 3 equal cards, 24px gap, middle card emphasized

## Type scale
- Display: 56/64, semibold
- H2: 32/40, semibold
- Body: 16/24, regular
- Caption: 13/20, medium, muted

## Spacing rhythm
- Section vertical padding: 96px desktop, 56px mobile
- Card internal padding: 32px
- Stack gaps: 8 / 16 / 24 only

## Color roles (mapped to our tokens)
- Page background: surface.default
- Card background: surface.raised
- Emphasized card border: brand.primary
- Body text: text.primary; captions: text.muted

## Components identified
- Button (primary, ghost), Badge, PricingCard, FAQ accordion

## Not visible in reference (decisions needed)
- Hover and focus states, loading state, 390px layout order

Section 06

Step 3: map to your own tokens, not the screenshot's pixels

The goal is parity of intent, not a pixel clone. Colors map to the team's semantic tokens, type sizes snap to the nearest step in the existing scale, and spacing snaps to the existing rhythm. Where the reference cannot be expressed in the current system, the spec records the gap as a decision for a human rather than inventing a new one-off value.

This mapping step is what makes the workflow safe to use on competitor-inspired references. The structure and density carry over; the brand does not.

tableMapping decisions

1Maps cleanly

Reference #1A1A2E body text maps to text.primary

2Snaps to scale

Reference 17px body snaps to the 16px body token

3Needs a decision

Reference uses a 12-column 1280px grid; our system is 1200px

4Needs a decision

Reference shadow has no equivalent elevation token

5Out of scope

Reference photography style is a brand question, not a build question

Every reference value either maps to an existing token or becomes a recorded decision.

Section 07

Step 4: build against the spec

The agent builds the screen from the spec, not from re-inspecting the image on every decision. The spec is the contract; the image is the evidence it was derived from. Building from the spec keeps the implementation inside the component library and keeps later parity arguments resolvable.

Keep the build prompt narrow: this screen, this spec, these components, real content where it exists, and no redesigning of areas the spec already settles.

Build prompt

Build the pricing page from parity-spec.md. Use only existing components from src/components and tokens from tokens.css. Follow the layout regions, type scale, and spacing rhythm in the spec exactly. Use the real plan names and prices from content/pricing.json. Where the spec lists an open decision, choose the most conservative option and note it in build-notes.md. Do not invent new tokens, new components, or new colors. When the build runs locally, stop and wait for the parity check before making further changes.

Section 08

Step 5: capture the build the same way the reference was made

Capture the implementation at the reference viewport first, then at the other standard widths. The capture must be scripted so the parity check compares like with like on every iteration of the loop.

capture-build.mjs

import { chromium } from "playwright"

const route = process.argv[2] ?? "/pricing"
const viewports = [
  { name: "reference-1440", width: 1440, height: 1000 },
  { name: "tablet-768", width: 768, height: 1024 },
  { name: "mobile-390", width: 390, height: 900 },
]

const browser = await chromium.launch()
const page = await browser.newPage()

for (const viewport of viewports) {
  await page.setViewportSize({ width: viewport.width, height: viewport.height })
  await page.goto("http://localhost:3000" + route, { waitUntil: "networkidle" })
  await page.screenshot({ path: "parity/build/" + viewport.name + ".png", fullPage: true })
}

await browser.close()

Section 09

Step 6: run the parity gate

The parity check compares the captured build against the reference and the spec, and reports observable mismatches only: a region in the wrong place, a type size off the scale, a gap that breaks the rhythm, a color that left its role. Each mismatch gets a severity and a proposed patch.

The gate passes when no P0 or P1 mismatches remain and every P2 has either been fixed or accepted by a human. Differences that come from deliberate token mapping are listed as accepted, not hidden.

Parity check prompt

Compare parity/reference/desktop.png with parity/build/reference-1440.png against parity-spec.md. Report only observable mismatches. For each: region, what differs, measured or estimated values, severity (P0 task-breaking, P1 hierarchy or rhythm, P2 polish, P3 judgment), and the smallest patch that would resolve it. List separately any differences that are expected results of token mapping and should be accepted. Do not patch anything yet. End with a verdict: PASS, PASS WITH ACCEPTED DIFFERENCES, or FAIL with the blocking items.

Section 10

Step 7: patch in passes until the gate passes

Apply patches in small passes, recapture, and re-run the gate. Structure and hierarchy first, then spacing and type, then color and polish. Most screens converge in two or three iterations; a screen that does not converge usually has a spec problem, not a build problem.

Pass 1: layout regions, ordering, and responsive structure.
Pass 2: type scale and spacing rhythm.
Pass 3: color roles, borders, shadows, and states.
After each pass: recapture and re-run the parity check.

diagramPatch and recapture loop

Step 1

Patch layout and structure

Step 2

Recapture at all widths

Step 3

Re-run the parity check

Step 4

Patch spacing and type

Step 5

Patch color and states

Step 6

Exit on a PASS verdict

feeds next cycle

Each pass fixes one layer, recaptures, and re-runs the gate; the loop ends on a passing verdict or a spec question for a human.

Section 11

Case study: pricing page from an approved Figma frame

A team had an approved 1440px pricing page frame and a deadline. The spec extraction took the agent one pass and a ten-minute human review, which caught that the middle plan's emphasis was a border plus background shift, not just a badge.

The first build passed structure but failed the gate with 9 mismatches: the type scale used 18px body instead of the 16px token, card padding was 24px against a 32px spec, and the FAQ accordion came from a generic component instead of the design-system one. Two patch passes later the gate returned PASS WITH ACCEPTED DIFFERENCES, the only accepted item being a shadow mapped to the nearest elevation token.

Total time was about two hours, and the review conversation was about three named decisions instead of a long thread of screenshots and adjectives.

Section 12

Case study: dashboard from a competitor-inspired concept

A product designer liked the density and panel structure of a competitor-style concept screenshot but needed it rebuilt entirely in the team's own tokens and components. The spec deliberately recorded only structure: regions, column behavior, table density, and chart placement, with every color and type value mapped to the team's system before the build started.

The parity gate was scoped to structure and density rather than color, which kept the check honest about what parity meant here. The first build dropped one summary panel below the fold at 1280px; the gate caught it, the patch reflowed the grid, and the second run passed. The shipped dashboard reads as the team's product, with the concept's information density intact.

Section 13

Case study: conference-talk mockup, the night before

A speaker had a static mockup of a demo screen and one evening to make it real. The spec extraction was deliberately rough, with ranges instead of measurements, and the gate threshold was set to P0 and P1 only.

The loop ran three times in about 90 minutes. The gate caught two issues that would have shown on stage: the headline wrapped to three lines at the projector's 1280px width, and the success state used a gray that disappeared at low contrast. The speaker accepted four P2 mismatches, presented the next morning, and converted the spec into a real ticket the following week.

Section 14

Good vs bad parity output

A weak parity report says the build is close to the design. A strong one names regions, values, severities, and the smallest patch, and it never lets enthusiasm substitute for evidence.

tableParity report quality comparison

1Bad

The build closely matches the reference

2Good

PASS WITH ACCEPTED DIFFERENCES: 0 P0, 0 P1, 2 accepted token-mapping differences listed below

3Bad

Spacing is a bit off in the cards

4Good

P1: Card internal padding is 24px; spec requires 32px, which compresses the price block and weakens the emphasized plan

5Bad

Fonts look different

6Good

P2: Body copy renders at 18px/28px; spec maps body to the 16px/24px token, breaking vertical rhythm in the FAQ section

The gate only works if its output is specific enough to act on.

Section 15

Limits: what the gate cannot prove

The parity gate proves the build matches the reference in observable ways at the captured widths. It cannot prove the reference design works for users, that the content strategy is right, or that interaction details invented for states the screenshot never showed are correct.

Humans own the spec review, the accepted-differences list, the decisions the reference left open, and the call on whether parity is even the right goal for a given screen.

It cannot validate hover, focus, motion, or loading behavior the reference does not show.
It cannot prove accessibility; semantics and keyboard checks are separate work.
It cannot decide whether to follow the reference or improve on it; that is a human call recorded in the spec.
It cannot judge widths and states that were never captured.

Section 16

Reusable parity workflow

Once the loop works for your stack, save the prompt as a reusable command and keep the spec and parity templates in the repo. The deliverable is always the same pair: a built screen and a parity report a designer can sign.

Screenshot-to-implementation parity workflow

1. Collect the reference image, its viewport, and the written constraints.
2. Extract a parity spec: regions, grid, type scale, spacing rhythm, color roles, components.
3. Map every reference value to existing tokens; record gaps as human decisions.
4. Build from the spec using existing components, with no invented tokens.
5. Capture the build at the reference viewport and standard widths.
6. Run the parity check and record mismatches with severity and smallest patch.
7. Patch in passes (structure, then rhythm and type, then color) and recapture each time.
8. Exit only on PASS or PASS WITH ACCEPTED DIFFERENCES signed off by a human.

Sources

Screenshot-to-Implementation Parity