Section 1
A screenshot is evidence, not a full specification
Screenshots are powerful because they make visual intent concrete. They show hierarchy, density, rhythm, copy, and component relationships better than a paragraph can.
But a screenshot is also incomplete. It does not show responsive behavior, interaction states, accessibility semantics, data edge cases, performance constraints, or why the design made a particular tradeoff.
The agent needs the screenshot and the interpretation layer around it: what matters, what can change, what must not change, how unknown states should be handled, and how the implementation will be judged.
Section 2
Annotate before building
The best screenshot workflow starts with annotation. Ask the agent to describe the visible structure before it writes code. If it cannot name the layout, hierarchy, and states, it is not ready to implement.
Annotation turns the image into a working brief. The agent should identify regions, spacing relationships, typography hierarchy, interactive elements, data assumptions, component candidates, and unknowns. This is where the designer can correct interpretation before code exists.
- Name layout regions: navigation, header, filters, content, side panels, tables, modals, and footers.
- Name hierarchy: what reads first, second, and third.
- Name interaction states: loading, empty, error, disabled, hover, focus, expanded, selected.
- Name responsive assumptions: what stacks, what collapses, what remains visible first.
- Name design-system mapping: which existing components and tokens should be used.
Component map
Implementation plan
Built UI
Screenshot QA
The screenshot becomes useful when the agent converts it into named layout regions, states, constraints, and checks.
Section 3
Case study: billing settings from a reference
A product team wants to rebuild a billing settings page from a reference screenshot. The screenshot shows a left navigation rail, a billing summary, plan controls, payment method, invoice table, and cancellation link.
A weak implementation prompt asks the agent to recreate the screenshot. That usually produces an attractive approximation with missing states, wrong spacing, invented component behavior, and fragile responsive layout.
A stronger workflow asks the agent to identify the information architecture, map each region to existing components, list required states, explain responsive behavior, and implement only after the plan is approved. The goal is not pixel worship. The goal is to preserve the design decisions that make the screen work.
Reference: navigation rail
Reference: billing summary
Implementation: component mapping
Implementation: invoice table
QA: mobile order
QA: empty and error states
The implementation should be compared against reference structure, not only against overall visual similarity.
Section 4
Separate visual traits from product rules
Screenshots contain two kinds of information: visual traits and product rules. Visual traits include density, spacing, hierarchy, and rhythm. Product rules include permissions, billing states, invoice availability, payment-method validation, and destructive-action safeguards.
An agent can infer visual traits from a screenshot, but it should not invent product rules. If the screenshot shows a cancellation link, the implementation still needs to know who can cancel, what confirmation is required, what happens after cancellation, and how errors are handled.
Invoice table uses compact rows and muted metadata
Invoices may be unavailable while billing sync is pending
Cancellation link is visually secondary and separated
Cancellation requires confirmation and owner permission
Plan summary appears before payment method
Plan controls are disabled during pending downgrade
A screenshot can guide appearance, but product rules need explicit context.
Section 5
Project file structure
Keep screenshots, annotations, implementation notes, and QA output in the project. Otherwise the next agent run has to rediscover the same context and may repeat the same mistakes.
A screenshot implementation packet should make the reference inspectable and the decisions traceable. The packet should also record what the screenshot does not define.
my-product/
├── DESIGN.md
├── AGENTS.md
├── src/
│ └── app/settings/billing/
│ ├── page.tsx
│ └── components/
├── examples/
│ └── screenshots/
│ ├── billing-reference-desktop.png
│ ├── billing-reference-mobile.png
│ ├── billing-reference-annotations.md
│ └── billing-unknowns.md
├── agent-workflows/
│ └── screenshot-implementation/
│ ├── brief.md
│ ├── component-map.md
│ ├── state-inventory.md
│ ├── implementation-plan.md
│ └── qa-report.md
└── scripts/
└── capture-billing-screenshots.jsSection 6
The annotation prompt
Before implementation, ask for an annotation pass. The output should be a structured interpretation that a designer can approve or correct.
Inspect these screenshots before building. Inputs: - DESIGN.md - AGENTS.md - billing-reference-desktop.png - billing-reference-mobile.png - existing settings components Return: 1. visible regions 2. hierarchy and scan path 3. component candidates 4. spacing and density observations 5. typography and copy observations 6. required interaction states 7. responsive behavior 8. unknown product rules 9. risks if implemented from screenshot alone Do not write code yet.
Section 7
The implementation prompt
The implementation prompt should tell the agent to inspect before building. It should also make clear that visual similarity is not enough: the result must use existing components, preserve states, and pass screenshot review.
Implement this screen from screenshot evidence. Inputs: - DESIGN.md - AGENTS.md - examples/screenshots/billing-reference-desktop.png - examples/screenshots/billing-reference-mobile.png - examples/screenshots/billing-reference-annotations.md - existing settings components First output a plan: 1. visible regions 2. reusable components 3. missing states 4. responsive behavior 5. risks or ambiguities After approval, implement the page. Rules: - Use existing components before creating new ones. - Do not invent colors or spacing outside DESIGN.md. - Preserve interaction states: loading, empty, error, disabled, focus. - Do not invent product rules; use placeholders and mark unknowns. - Capture desktop and mobile screenshots after implementation. - Report mismatches between reference and implementation.
Section 8
What agents commonly get wrong
Screenshot implementation failures are predictable. Agents often preserve the obvious layout while losing the subtle reasons the design works. They may copy the visible card arrangement but miss table density. They may match colors while changing hierarchy. They may stack mobile content in DOM order even when the design requires task order.
The fix is to ask for explicit comparison after implementation. The agent should compare the built page against the annotated intent, not only against the raw screenshot.
- Spacing drift: row heights, section gaps, and card padding become more comfortable than the reference.
- Hierarchy drift: secondary actions become too loud or primary actions move below supporting content.
- State gaps: empty, error, loading, disabled, focus, and long-content states are missing.
- System drift: the agent creates local components instead of using shared shadcn or product components.
- Responsive drift: mobile preserves visual stacking but loses task priority.
Section 9
What to compare in QA
A screenshot QA pass should compare more than pixels. It should compare intent. The reference might have a particular reading order, density, or contrast relationship that a simple visual match misses.
Ask the agent to report observable mismatches, likely causes, and safe fixes. It should separate objective mismatches from subjective polish questions.
Do primary, secondary, and destructive actions read in the same order?
Does the implementation preserve row rhythm and section spacing?
Are loading, empty, error, disabled, hover, and focus states accounted for?
Does mobile preserve the user job rather than just stacking blocks?
Does the implementation use tokens and components from the product system?
Do labels, focus order, contrast, and landmarks support real use?
Screenshot QA works best when the agent checks structure and behavior, not only appearance.
Section 10
Good vs bad screenshot prompts
The weak prompt treats the screenshot as a picture to copy. The strong prompt treats the screenshot as evidence to interpret.
Recreate this screenshot exactly
Preserve hierarchy, density, and task order; map regions to existing components
Make it look like this
Use this screenshot as evidence for layout; do not copy brand colors or invent product rules
Build the page from the image
Annotate first, list unknown states, then wait for approval before implementation
A useful screenshot prompt names what to preserve, what to infer, and what not to invent.
Section 11
Reusable workflow
Use this whenever a screenshot becomes a build reference. The output should be a page, a QA report, and a reusable packet for future runs.
1. Collect reference screenshots at relevant widths. 2. Add DESIGN.md, AGENTS.md, component source, and product rules. 3. Ask the agent to annotate the screenshot before coding. 4. Correct the annotation and approve the implementation plan. 5. Implement using existing components and tokens. 6. Capture desktop and mobile screenshots. 7. Compare built UI against reference structure and annotated intent. 8. Fix mismatches in passes: task order, layout, states, polish. 9. Record the QA report and remaining unknowns. 10. Keep the packet so future agents do not rediscover the same context.

