Agentic Design School

Section 01

Why prototype in code at all

Some design questions cannot be answered with a clickable mockup. Anything that depends on real interaction — typing into a configurator and watching a price change, dragging a date range and seeing a chart redraw, feeling how a navigation pattern behaves with real content depth — needs something that actually runs. Those are exactly the questions usability tests are best at answering and design tools are worst at simulating.

Coded prototypes used to be too expensive to justify for a single test session. With an agent doing the assembly from a harness the team already owns — its tokens, its components, its sample data — the cost drops to a half day, which changes the calculation: the prototype becomes the cheapest honest way to put the question in front of users.

One framing matters before anything gets built: the prototype is throwaway evidence, not production code. It exists to answer a question on a date, and its value is the answer, not the artifact. The companion article on prototype-first, production-later covers why holding that line makes both the prototype and the eventual production work better.

Projects to inspect

Prototype first, production laterWhy prototypes should stay throwaway evidence, and how to keep prototype code from leaking into production.

Section 02

When to reach for this workflow

Reach for it when there is a real session on the calendar — usability tests, customer interviews, a stakeholder decision meeting — and the question on the table is interactive enough that a static mockup would be a guess about the answer.

It is a Foundation workflow: one agent, a tight loop, and a clear stop condition. The discipline is in the scoping, not the orchestration.

Usability tests or customer interviews scheduled within days, not weeks.
A design question that depends on interaction, real data shape, or content depth.
Two or three genuinely different directions the team cannot choose between on taste alone.
A pitch or stakeholder review where seeing the idea behave is more persuasive than seeing it drawn.

Section 03

The harness: what you need before the sprint starts

The half-day estimate assumes a harness exists: a small repository the agent can build inside, with the design tokens, a usable subset of the component library, realistic sample data, and a dev server that starts with one command. The harness is what keeps a prototype on-brand and on-system without anyone polishing it; the agent assembles from parts instead of inventing them.

If no harness exists, building a minimal one is the first sprint's real output — a Vite or Next.js app with the tokens imported, ten or fifteen core components, and one sample dataset shaped like real product data. Every sprint after that starts from it.

tablePrototype harness contents

1Tokens

The product's design tokens (or the client's brand tokens) wired into the styling layer

2Components

A subset of the design system the agent can compose without asking

3Sample data

Realistic data shaped like production: names, prices, dates, edge-case lengths

4Dev server

One command to run, one URL to share in a testing session

5Index page

A plain page linking each variant with the question it exists to answer

6Capture script

A small Playwright script for screenshots used in the visual QA pass

The harness is owned by the team and reused across sprints; the prototypes inside it are disposable.

Section 04

Scope to the riskiest assumption

The most common way a prototype sprint fails is by prototyping the whole feature. The brief should name the single riskiest assumption — the thing that, if wrong, changes the design — and the prototype should cover the happy path through that assumption and nothing else. Settings screens, authentication, and edge cases are stubbed or skipped unless they are the question.

Writing the assumption down also gives the sprint its stop condition. The prototype is done when a test participant can encounter the assumption and react to it, not when the screen looks finished.

prototype-brief-template.md

# Prototype brief: {name}

## The question
The design question this prototype exists to answer, in one sentence.

## The riskiest assumption
What we believe that, if wrong, changes the design. The prototype must let participants encounter this.

## The happy path
The single flow a participant walks: entry point, 3–6 steps, end state. Nothing else gets built.

## Variants
Only where the design question genuinely differs. For each: what changes, and what staying the same means.

## Out of scope
Explicitly: auth, settings, error handling, responsive breakpoints not used in the session, real backend.

## Session context
Who will use it, on what device, on what date, moderated or not.

## Harness
Repo path, tokens source, components available, sample data file.

## Disposal
This prototype is throwaway evidence. It is archived after the session readout; no code is promoted to production.

Section 05

The orchestration pattern: one agent, a loop, then fan-out

The sprint runs as a Claude Code dynamic workflow. Including the word workflow in the prompt, or running with /effort ultracode, makes Claude write a JavaScript orchestration script that runs in the background. For most of the sprint that script drives a single build agent through a loop: build the happy path, capture screenshots, compare against the brief, fix, repeat — with intermediate captures and notes held in script variables rather than in Claude's context.

The fan-out happens only after the happy path holds. The script dispatches one agent per variant — two or three at most, well inside the sixteen-concurrent and one-thousand-per-run limits — each starting from the working base and changing only what the variant's question requires. Variants that differ in everything teach nothing; the script enforces the constraint by passing each variant agent the same base and a one-line statement of what may change.

The run is resumable, which matters on a half-day clock, and the working orchestration is saved to .claude/workflows/ in the project (or ~/.claude/workflows/ for personal use) so the next sprint starts from a known-good /command. The build and QA subagent definitions live in .claude/agents/ as markdown files.

diagramPrototype sprint loop

Step 1

Write the brief and riskiest assumption

Step 2

Build the happy path from the harness

Step 3

Capture and compare against the brief

Step 4

Fix and loop until it holds

Step 5

Fan out 2–3 variants from the base

Step 6

Visual QA each variant

Step 7

Package behind the index page

feeds next cycle

The build loop runs until the happy path matches the brief, then fans out variants, QAs each, and packages the set.

Section 06

Step 1: the workflow prompt

The prompt names the brief, the harness, the variants, and the packaging, and it states the disposal rule so the agent does not over-engineer. Everything the agent needs to decide is in the brief; everything it should not decide is out of scope.

Prototype sprint workflow prompt

Run this as a workflow.

Build the prototype described in prototypes/briefs/pricing-configurator.md inside the harness at prototypes/harness/.

Phase 1 — happy path: build the single flow in the brief using only the harness components, tokens, and sample data. After each build pass, run the capture script (npm run proto:capture) and compare the screenshots against the brief. Loop until the flow matches the brief and a participant could complete it without hitting a dead end. Do not build anything listed as out of scope.

Phase 2 — variants: from the working base, build the variants listed in the brief. Each variant changes only what its question requires; everything else stays identical to the base.

Phase 3 — QA and packaging: run a visual QA pass on the base and each variant at the session's device width. Fix anything that blocks the happy path; log polish issues without fixing them. Update prototypes/index.html so each variant is linked with the question it exists to answer.

This is a throwaway prototype for a testing session, not production code: no tests, no auth, no error handling beyond what the flow needs, and no refactoring of the harness.

Section 07

Step 2: a sketch of the orchestration

As with all dynamic workflows, Claude writes the orchestration script when the run starts; the sketch below only shows the shape so the loop and the fan-out are easy to picture. The brief, the capture results, and each variant's status live in the script, not in the conversation.

Dynamic workflow sketch (illustrative pseudo-API, not runnable)

// Sketch of the orchestration Claude generates for this workflow.
// agent(prompt, options) dispatches a subagent and resolves with its output.

const brief = await agent("Read prototypes/briefs/pricing-configurator.md and return the happy path, variants, and out-of-scope list as JSON.", { model: "haiku" })

let status = { done: false, notes: "" }
for (let pass = 0; pass < 4 && !status.done; pass++) {
  status = JSON.parse(
    await agent(
      "Build or fix the happy path in prototypes/harness per the brief. Previous QA notes: " + status.notes +
      ". Then run npm run proto:capture, compare captures against the brief, and return { done, notes } as JSON.",
      { model: "sonnet" }
    )
  )
}

const variants = JSON.parse(brief).variants
const results = await Promise.all(
  variants.map((v) =>
    agent(
      "Copy the base flow into variants/" + v.slug + " and change only: " + v.changes + ". Run the capture script and return a one-line status.",
      { model: "sonnet" }
    )
  )
)

await agent("Run visual QA on the base and each variant at 390px and 1280px; fix happy-path blockers only; update prototypes/index.html.", { model: "sonnet" })

return { passes: status, variants: results }

Section 08

Step 3: a prototype builder subagent

The builder definition encodes the rules that keep prototypes cheap: harness parts only, happy path only, and no production habits. Defining it once in .claude/agents/ means every sprint inherits the same discipline.

.claude/agents/prototype-builder.md

---
name: prototype-builder
description: Builds throwaway interactive prototypes inside the team's prototype harness, scoped to the happy path in the brief. Uses harness tokens, components, and sample data only.
tools: Read, Grep, Glob, Write, Edit, Bash
---

You build prototypes for testing sessions, not products.

Rules:
- Use only the components, tokens, and sample data already in the harness. Do not install new dependencies without being asked.
- Build exactly the happy path in the brief. Anything in the out-of-scope list is stubbed with a plain placeholder.
- Hard-code what the session will not exercise. Realistic sample data beats a real backend.
- No tests, no abstractions for reuse, no refactoring of the harness itself.
- Keep each variant's diff minimal: change only what the variant's question requires.
- After building, run the capture script and report what a participant would see at the session's device width.

Section 09

Step 4: visual QA before the session, polish never

The QA pass exists to protect the session, not the craft. A participant who hits a dead end, a button that does nothing, or text overflowing its container will spend the session reacting to the bug instead of the design question. Those get fixed. A spacing inconsistency or an off-tone empty state gets logged and ignored, because the prototype dies after the readout anyway.

Run the same capture-and-compare loop used during the build, once per variant, at the device width the session will actually use. The full visual QA workflow on this site goes deeper; for a prototype, the blocker-only subset is enough.

Walk the happy path end to end in each variant at the session's device width.
Fix anything that blocks completion or visibly breaks: dead ends, broken interactions, unreadable text.
Log polish issues without fixing them; attach the log to the readout if it is useful evidence.
Confirm the index page links every variant and states the question each one answers.

diagramBlocker-only QA pass per variant

Design decision

Walk the happy path

Design decision

Capture at session width

Design decision

Spot dead ends and breaks

Design decision

Fix blockers only

Design decision

Log polish without fixing

Design decision

Confirm the index page

Each variant gets the same short pass at the session's device width: blockers get fixed, polish gets logged, and the prototype stays cheap.

Section 10

Case study: a pricing configurator the day before customer interviews

A SaaS team had six customer interviews booked and an unresolved argument about whether customers would understand usage-based pricing if they could see the price respond to their own inputs. A static mockup could not test that; the reaction depended on typing real numbers and watching the total move.

The sprint ran the afternoon before the first interview. The happy path — pick a plan, set three usage sliders, see the monthly estimate update with a breakdown — was working in the harness after two build-loop passes, about ninety minutes. Two variants followed: one showing the breakdown by default, one revealing it on demand. Visual QA caught a slider label that overflowed at the 1280-pixel laptop width the interviews would use.

Five of six customers completed the flow unprompted. The breakdown-by-default variant produced noticeably fewer trust questions, which settled the argument, and the prototype was archived the following week, unpromoted, exactly as the brief said it would be.

Section 11

Case study: three navigation variants for a usability test

A product team was redesigning the navigation of a data-heavy admin product and had three real candidates: a fixed sidebar, a top bar with a mega-menu, and a hybrid that collapsed by section. The arguments were circular because every participant in them was imagining different content depth.

The sprint built one base — the product's real information architecture, real section names, sample data with the actual longest labels — and fanned out the three navigation treatments as variants that shared everything else. The constraint that variants differ only in the navigation was the whole value: when test participants got lost, the team could attribute it to the pattern rather than to incidental differences.

Eight moderated sessions later, the hybrid variant won on task completion for the deep sections and lost slightly on first-click confidence; the team shipped the hybrid with the top bar's labeling. The three prototypes were linked from one index page during the sessions and archived with the research readout afterward.

Section 12

Case study: an agency concept prototype from the client's real brand tokens

An agency pitching a service-portal concept had two days and a client review meeting. Static concept boards were the safe option; the team instead spent half a day of that time on a coded prototype, because the concept's selling point was how the portal behaved, not how it looked in a still.

The harness was assembled first: the client's published brand tokens — colors, type scale, radii — wired into the agency's standard prototype shell, plus sample data drawn from the client's public help-center categories. The happy path covered one service request from start to confirmation; one variant swapped the confirmation step for a live status timeline, which was the concept's riskiest idea.

In the review, the client's team passed a laptop around the table and walked the flow themselves, in their own brand, with their own category names. The agency won the engagement, and the prototype's only afterlife was a screen recording in the case study deck — the production build started from the design system contract, not from the prototype code.

Section 13

Good vs bad sprint outputs

A prototype sprint goes wrong in two directions: under-built, so the session trips over gaps that were supposed to be in scope, or over-built, so half a day becomes three and the prototype starts to smell like a product. The brief is the instrument that catches both; compare the output against it, not against taste.

tableSprint output quality comparison

1Bad

A polished marketing-grade screen for step one, with steps two and three unbuilt the night before the session

2Good

All five happy-path steps clickable end to end, hard-coded data, visible seams in places the session never touches

3Bad

Five variants that differ in layout, copy, color, and flow all at once

4Good

Two variants identical except for the one element the design question is about

5Bad

A prototype with auth, a database, and unit tests that the team is now reluctant to throw away

6Good

A folder of throwaway code behind an index page, archived with the research readout after the session

Judge the prototype against the brief and the session, not against production standards.

Section 14

What a prototype sprint cannot prove

A prototype that tests well proves that participants could complete the flow and how they reacted to the riskiest assumption, in a session, with sample data. It does not prove the feature is feasible at production scale, that the data model holds, or that the design works for the segments who were not in the room.

The throwaway rule is also a human decision the workflow can only recommend. Someone with authority has to say, after a good session, that the prototype still gets archived — because the moment prototype code starts shipping, the next sprint inherits production caution and stops being cheap.

It cannot validate feasibility, performance, or the data model behind the interaction.
It cannot generalize beyond the participants and tasks in the session.
It cannot make the keep-or-throw-away decision; a human owns the disposal rule.
It cannot replace the production design process; the prototype answers a question, the system ships the answer.

Section 15

The reusable sprint workflow

Save the orchestration to .claude/workflows/ and keep the harness healthy between sprints — refresh its tokens and components when the system changes, and the next half day starts from a running start instead of a setup morning.

Interactive prototype sprint workflow

1. Write the brief: the question, the riskiest assumption, the happy path, and the out-of-scope list.
2. Confirm the harness runs: tokens, components, sample data, dev server, capture script.
3. Build the happy path with the builder agent; loop capture-compare-fix until it matches the brief.
4. Fan out 2–3 variants from the working base, each changing only what its question requires.
5. Run a blocker-only visual QA pass per variant at the session's device width.
6. Package everything behind the index page, with each variant labeled by its question.
7. Run the session; capture findings against the riskiest assumption, not against the prototype.
8. Archive the prototype with the readout. Save the workflow; the next sprint reuses the harness, not the code.

Sources

Interactive Prototype Sprint

Why prototype in code at all

When to reach for this workflow

The harness: what you need before the sprint starts

Scope to the riskiest assumption

The orchestration pattern: one agent, a loop, then fan-out

Step 1: the workflow prompt

Step 2: a sketch of the orchestration

Step 3: a prototype builder subagent

Step 4: visual QA before the session, polish never

Case study: a pricing configurator the day before customer interviews

Case study: three navigation variants for a usability test

Case study: an agency concept prototype from the client's real brand tokens

Good vs bad sprint outputs

What a prototype sprint cannot prove

The reusable sprint workflow

Sources & further reading

Get the prototype brief template and harness checklist by email.

For deeper reading, explore the books behind the Agentic Design School curriculum.

The Agentic Designer

Claude Code for Designers

Open Design