Section 1
Why one draft quietly becomes the design
When an agent produces a single concept quickly, that concept tends to win by default. It arrived first, it looks finished, and the cost of asking for genuinely different alternatives feels higher than nudging the one on screen. Six weeks later the team is polishing a direction nobody actually chose.
Asking one agent for three options in one prompt does not fix this. The options share a context window, so they converge: the same layout with three coats of paint. And asking the same agent to critique its own favorite produces praise with a few softening caveats.
This workflow fixes both problems structurally. Each direction is drafted by a separate agent that never sees the others, and the critique stage is written into the orchestration script, so it happens whether or not anyone remembers to ask for it.
Section 2
When to reach for it
Use this when the decision is expensive to reverse and the solution space is genuinely open: a new flow, a redesign, a density or information-architecture choice, a visual refresh. The output is a decision artifact for a design lead, not production code.
Skip it when the direction is already constrained to one viable answer, or when the question is small enough that exploring it costs more than just building it.
- Redesigning a core flow where the team disagrees about the right approach.
- Choosing a structural direction, such as data density or navigation model.
- A visual or brand refresh with hard legal and accessibility constraints.
- Any decision likely to be relitigated later unless the alternatives were examined on the record.
Section 3
The orchestration pattern: parallel drafts, adversarial critique
This runs as a dynamic workflow. Including the word workflow in the prompt makes Claude Code write a JavaScript orchestration script that runs in the background: it fans out one drafting agent per direction, waits for all of them, then fans out critique agents, and only the final comparison report comes back into the main conversation. Intermediate drafts and critiques live in script variables and on disk, not in Claude's context, which is what keeps the directions independent.
The script can run up to 16 agents concurrently and up to 1,000 in a run, so five directions plus a full cross-critique matrix is comfortably inside the budget. Workflows are resumable, so a long exploration interrupted after drafting can pick up at the critique stage. Save the working script to .claude/workflows/ and it becomes a reusable slash command; /effort ultracode is the explicit way to ask for this orchestration depth.
The structural guarantee is the point: critique is not a follow-up question a human might forget to ask. It is a stage in the script, with named reviewers and a fixed rubric, and the run does not finish without it.
Design decision
Write the brief
Design decision
Assign design stances
Design decision
Draft directions in parallel
Design decision
Adversarial cross-critique
Design decision
Score against rubric
Design decision
Human decision
Design decision
Record the decision
The brief fans out, the critiques cross over, and a human makes the call.
Section 4
Step 1: write one brief, then assign stances
Every drafting agent receives the same brief: the user, the job, the constraints, the brand rules, and the anti-patterns. What differs is the stance: a one-paragraph design position the direction must commit to, such as progressive disclosure, expert density, or guided linear flow.
Stances are how you guarantee the directions differ in structure rather than in styling. If you cannot name three stances that a reasonable designer might defend, the problem may not need an exploration.
# Brief: onboarding flow redesign ## User and job New workspace admins setting up their first project. They need to reach a working project with at least one teammate invited, in one sitting. ## Constraints - Must work on mobile web; native apps are out of scope. - Uses existing design tokens and the current component library. - Legal: data-residency choice must appear before any content is created. - Accessibility: WCAG 2.2 AA; no information conveyed by color alone. ## Anti-patterns - No more than one optional step; the current flow has four and completion is 41%. - No fake progress indicators. ## Stances (one per direction agent) - Direction A: guided linear flow, one decision per screen. - Direction B: single-page setup with progressive disclosure. - Direction C: template-first; pick a working example, then customize.
Section 5
Step 2: define the drafting agent
The drafting agent produces a direction package, not a finished design: a concept statement, the screen-by-screen structure, the key interaction decisions, what the direction deliberately sacrifices, and a rough effort note. A consistent package format is what makes the directions comparable later.
--- name: direction-author description: Drafts one complete design direction from a brief and an assigned stance. Used in concept exploration workflows; never sees other directions. tools: Read, Write, Glob --- You draft exactly one design direction. Rules: - Commit fully to the assigned stance, even where it creates tradeoffs. Name the tradeoffs. - Stay inside the brief's constraints and anti-patterns; flag any conflict explicitly. - Output a direction package: concept statement (under 120 words), screen-by-screen structure, key interaction decisions, what this direction sacrifices, effort estimate (S/M/L), and open questions. - Do not reference, imagine, or hedge against other directions. - Write the package to the path you are given, in markdown.
Section 6
Step 3: the orchestration script
The script fans out the drafting agents in parallel, then builds the critique assignments so that every direction is reviewed by agents that did not write it. Each critique agent argues against the direction it reviews: its job is to find where the direction fails the brief, not to balance praise and concerns.
// Sketch of the orchestration script Claude Code generates for this workflow.
// agent(prompt, options) runs a subagent and resolves with its final answer.
import { readFile, writeFile } from "node:fs/promises"
const brief = await readFile("exploration/brief.md", "utf8")
const stances = ["guided-linear", "progressive-disclosure", "template-first"]
await Promise.all(
stances.map((stance) =>
agent(
"Use the direction-author agent rules. Brief follows. Your stance is " + stance +
". Write your direction package to exploration/directions/" + stance + ".md.\n\n" + brief,
{ model: "opus" }
)
)
)
// Adversarial critique: every direction is reviewed against the others by a non-author.
const critiques = []
for (const target of stances) {
for (const reviewerStance of stances.filter((s) => s !== target)) {
critiques.push(
agent(
"Use the direction-critic agent rules. Critique exploration/directions/" + target +
".md against exploration/brief.md and scoring-rubric.md, from the perspective of someone who believes the " +
reviewerStance + " direction is stronger. Score every rubric criterion 1-5 with evidence.",
{ model: "sonnet" }
)
)
}
}
const critiqueResults = await Promise.all(critiques)
await writeFile("exploration/critiques.md", critiqueResults.join("\n\n---\n\n"))
const comparison = await agent(
"Read exploration/directions and exploration/critiques.md. Produce a comparison report: rubric scores per direction, the strongest argument against each, constraint violations, and the open questions a human must decide. Do not pick a winner.",
{ model: "opus" }
)
await writeFile("exploration/comparison.md", comparison)Section 7
Step 4: score against an explicit rubric
Critique without a rubric collapses into taste, and agents are very good at producing confident taste. The rubric names the criteria, weights them, and requires evidence from the brief or the direction package for every score.
Keep the rubric short enough that a human can hold it in their head while reading the comparison report. Five to seven criteria is usually right.
# Scoring rubric Score each criterion 1-5. Every score must cite evidence from the direction package or the brief. A 3 means adequate; reserve 5 for directions that make the criterion a strength. | Criterion | Weight | What a 5 looks like | |---|---|---| | Job completion | 3 | A first-time admin reaches a working project with a teammate invited, with the fewest unforced decisions | | Constraint fit | 3 | No conflicts with legal, accessibility, or platform constraints; conflicts are disqualifying, not a low score | | Clarity of hierarchy | 2 | Each screen has one obvious primary action and the structure explains itself | | Effort to ship | 2 | Reuses existing components and patterns; estimate S or M with believable reasoning | | Differentiation | 1 | The direction is structurally distinct from the current product and from the other stances | | Risk honesty | 1 | The package names what the direction sacrifices and where it could fail |
Design decision
Read direction package
Design decision
Apply each rubric criterion
Design decision
Cite evidence per score
Design decision
Apply criterion weights
Design decision
Flag constraint violations
Design decision
Feed the comparison report
Every critique walks the same scoring path, so the comparison report ranks evidence rather than confidence.
Section 8
Step 5: the human decision
The comparison report deliberately does not pick a winner. It puts the scored directions, the strongest argument against each, and the unresolved questions in front of a design lead, who chooses, blends, or sends a direction back for another pass.
Record the decision and the reasons next to the direction packages. The rejected directions are part of the value: the next time someone asks why the product is not template-first, the answer is on file with scores and evidence.
Weighted rubric scores for every direction with one-line evidence
The single best argument against each direction, taken from the critiques
Any legal, accessibility, or platform violations, which disqualify rather than score
Specific elements worth carrying across directions, named per element
Decisions the brief left unresolved that the human must settle
The report is built for a decision meeting, not for reading cover to cover.
Section 9
Case study: onboarding flow redesign
A B2B team with a 41% onboarding completion rate ran the exploration with the three stances above. Drafting took 22 minutes; the full critique matrix of six reviews took another 15.
The critiques did real work. The guided-linear direction scored highest on job completion but its critics showed it required eight screens to satisfy the legal data-residency step, against four in the brief's spirit. The single-page direction scored well on effort but two critiques flagged that its disclosure pattern hid the teammate invitation, the one step most correlated with retention. The template-first direction drew the strongest objection: templates implied content creation before the data-residency choice, a constraint violation rather than a preference.
The lead chose guided-linear, pulled the template gallery in as the final step, and cut two optional screens the critique had marked as low value. The decision record, including the rejected directions, took one page.
Section 10
Case study: choosing a data-density direction
An analytics product needed to settle a long-running argument about density before a navigation rework. Four stances were drafted: expert-dense tables, progressive drill-down, dashboard-of-summaries, and a hybrid with a density toggle.
The adversarial stage was decisive in an unexpected way. The hybrid direction, which the team had assumed would win as the safe compromise, took the lowest weighted score: three separate critiques showed it doubled the design and QA surface for every future feature, and its own package admitted the toggle existed because the direction would not commit. The expert-dense direction scored highest on job completion for the product's actual daily users, with its accessibility risks itemized and judged fixable.
The team chose expert density, recorded the toggle idea as explicitly rejected, and reported that the decision meeting took 40 minutes instead of the third workshop it had been heading toward.
Section 11
Case study: brand refresh with legal and accessibility constraints
A marketing team explored five visual directions for a brand refresh, with the brief carrying two hard constraints: an industry rule about how comparative claims could be displayed, and WCAG 2.2 AA contrast on all text over imagery.
The critique stage killed two directions early. One relied on light typography over photography and could not meet contrast without abandoning its core idea; another's hero pattern placed comparative claims in a layout the legal reviewer agent flagged as non-compliant in three of its five templates. Both were eliminated before any human spent time falling in love with them.
The remaining three went to the creative director with scores and objections attached. The chosen direction borrowed its color system from one of the eliminated ones, a blend the comparison report had explicitly suggested.
Section 12
Good vs bad exploration output
A weak exploration produces three skins of the same idea and a critique that praises everything. A strong one produces directions that disagree with each other, critiques that draw blood, and a decision a team can stand behind a quarter later.
Three directions that share one layout with different accent colors
Three directions with different screen counts, different first decisions, and named sacrifices
Critique: all directions are strong, with minor polish opportunities
Critique: Direction B hides the teammate invite behind disclosure; the brief's retention goal makes this a 2 on job completion
Recommendation: blend the best of all three
Comparison report: scores, strongest objection per direction, two named blend candidates, and four open questions for the lead
The test is whether a director could make a confident call from the report alone.
Section 13
Limits: what the exploration cannot decide
Agents can generate distinct directions and argue about them honestly when the structure forces it. They cannot know which tradeoff the organization should accept, how much appetite exists for a riskier direction, or what users will actually do. Scores are inputs to a human decision, not the decision.
The rubric is also a human responsibility. If the criteria or weights are wrong, the workflow will efficiently rank directions against the wrong standard.
- It cannot replace usability testing; a winning direction is still a hypothesis.
- It cannot settle taste or brand questions; it can only make the options and tradeoffs explicit.
- It cannot detect constraints the brief never mentioned.
- Scores between directions are comparable within one run, not across different briefs or runs.
Section 14
Reusable exploration workflow
Save the working script to .claude/workflows/ and the agent definitions to .claude/agents/, and the exploration becomes a command the team runs whenever a decision is big enough to deserve real alternatives.
1. Write one brief: user, job, constraints, brand rules, anti-patterns. 2. Name 3-5 design stances that a reasonable designer could defend. 3. Run as a dynamic workflow: one direction-author agent per stance, drafted in parallel and in isolation. 4. Cross-assign direction-critic agents so every direction is reviewed only by non-authors. 5. Score every direction against the explicit rubric, with evidence required for each score. 6. Generate a comparison report with scores, strongest objections, blend candidates, and open questions. 7. A human picks, blends, or sends a direction back for another pass. 8. Record the decision and the rejected directions next to the packages.
Sources

