AAgentic Design School
Research workflow
Foundation

Deep Design Research

A workflow for using Claude Code's bundled /deep-research dynamic workflow to answer design questions with cross-checked, cited evidence: competitive teardowns, pattern research, accessibility standards, and platform conventions.

OrchestrationDynamic workflow with cross-checked sources

Typical run20–45 minutes per question

Last reviewed2026-06-02

View the code samples on GitHub

Section 1

Why deep research belongs in design work

Designers answer research questions every week without calling them research. How do competitors handle empty states? What does WCAG actually require for a date picker? Is there real evidence behind the pricing-page pattern everyone copies? These questions usually get answered by twenty browser tabs and a hunch.

The problem with the twenty-tab method is not speed, it is confidence. The first three results shape the answer, claims get repeated without sources, and nobody can later explain why the team believed something. When a stakeholder challenges the decision, the research has to be redone from scratch.

Deep research with an agent fixes the confidence problem before the speed problem. The workflow fans out searches across angles you chose, fetches the sources, cross-checks claims against each other, and returns a cited report. Claims that do not survive cross-checking get filtered out instead of quietly surviving into the design brief.

Section 2

When to reach for this workflow

Use deep research when the answer matters beyond this sprint, when claims will be repeated to stakeholders, or when the question touches standards and conventions that have a correct answer. A quick chat question is fine for trivia. A cited report is what you need before a redesign, a pattern decision, or an accessibility commitment.

Skip it when the question is about your own users or your own product data. Web research can tell you what competitors do and what standards require. It cannot tell you what your users need; that is a job for interviews, analytics, and the user research synthesis workflow.

  • Competitive teardowns: how a set of named products handle a specific moment or pattern.
  • Pattern research: what variations of a pattern exist and what evidence supports each.
  • Standards research: what WCAG, platform HIGs, or regulation actually require, with exact references.
  • Convention research: what users likely expect because of the platforms they already use.

Section 3

How the dynamic workflow runs

Claude Code's /deep-research command is a bundled dynamic workflow. A dynamic workflow is a JavaScript script that Claude writes and then runs in the background; the script orchestrates subagents, and intermediate results stay in script variables instead of flooding Claude's context. That is what lets one research run hold dozens of fetched pages without the main session losing the thread.

Workflows can run up to 16 agents concurrently and up to 1,000 agents in a single run, and a run is resumable if it gets interrupted. You trigger workflow mode by including the word workflow in your prompt or by running /effort ultracode, and /deep-research is the bundled research workflow command. It requires the WebSearch tool to be enabled.

Once you have a research prompt you like, you can save the whole thing to .claude/workflows/ as a reusable slash command, so the team runs the same research procedure every time instead of rewriting the prompt.

diagramDeep research stages
1

Design decision

Frame the question

2

Design decision

Scope angles

3

Design decision

Fan out searches

4

Design decision

Fetch sources

5

Design decision

Cross-check claims

6

Design decision

Filter weak claims

7

Design decision

Cited report

One question becomes parallel angles, fetched sources, cross-checked claims, and a cited report.

Section 4

Write the research question first

The single biggest quality lever is the question. A vague question produces a survey of everything; a sharp question produces an answer the team can act on. Name the products, the moment in the experience, the user, and what decision the research will inform.

A good research question also says what would change your mind. If nothing could, you are looking for confirmation rather than research, and the agent will happily provide it.

Research question for the empty-states teardown
Research question: How do Linear, Asana, Monday.com, ClickUp, and Notion handle empty states in their main dashboard and project views?

Decision this informs: We are redesigning the analytics dashboard for first-run and zero-data situations. We need to know which empty-state patterns are common, which are praised or criticized in published teardowns and reviews, and what each product actually shows on first run.

What counts as evidence: official product documentation, changelogs, published teardowns with screenshots, and reviews from the last 3 years. Marketing copy alone does not count as evidence of what the product does.

What would change our mind: evidence that guided setup checklists outperform sample data, or vice versa, in published case studies.

Section 5

Scope the angles, do not let the agent guess them

The fan-out is where research breadth comes from, so name the angles yourself. For a competitive teardown the angles are usually the products plus a synthesis angle. For a standards question the angles are the standard itself, the platform conventions, and known criticism or edge cases.

Five to eight angles is the practical range. Fewer than four and the cross-checking has nothing to push against; more than ten and you spend your review time on duplication.

  • One angle per named competitor or platform.
  • One angle for the standard or guideline text itself, fetched from the primary source.
  • One angle for published criticism, failures, or counter-examples.
  • One angle for recency: changelogs and announcements from the last 12 months.

Section 6

The workflow prompt

This is the prompt to give Claude Code. Including the word workflow, or using /deep-research directly, tells Claude to orchestrate the run as a background dynamic workflow rather than answering inline.

Deep design research workflow prompt
Run this as a deep research workflow.

Question: How do Linear, Asana, Monday.com, ClickUp, and Notion handle empty states in their dashboard and project views?

Angles to research in parallel:
1. Linear empty states and first-run experience
2. Asana empty states and onboarding checklists
3. Monday.com empty states and template galleries
4. ClickUp empty states and sample data
5. Notion empty states and template pickers
6. Published UX teardowns or pattern-library entries about empty states in project tools
7. Evidence about empty-state effectiveness: case studies, A/B results, NN/g articles

For each angle: search the web, fetch the strongest 3-5 sources, and extract concrete claims with the URL they came from.

Cross-check: a claim about what a product does must be supported by at least two independent sources, or by one primary source (official docs or changelog). Drop claims that fail this test and list them separately as unverified.

Output: a report with one section per product, a comparison table of empty-state patterns, a list of verified claims with citations, a list of dropped claims, and open questions we should answer with our own usability testing.

Section 7

What the orchestration looks like under the hood

You do not have to write the orchestration script; Claude writes it when the workflow starts. But it helps to know roughly what it does, because that is where the cross-checking lives. The sketch below uses an agent() pseudo-API to show the shape. It is a sketch, not literal Claude Code internals.

Dynamic workflow sketch (pseudo-code)
// Sketch of the orchestration script Claude generates for a deep research run.
// agent(prompt, options) launches a subagent and returns its result.

const angles = [
  "Linear empty states and first-run experience",
  "Asana empty states and onboarding checklists",
  "Monday.com empty states and template galleries",
  "ClickUp empty states and sample data",
  "Notion empty states and template pickers",
  "Published teardowns of empty states in project tools",
  "Evidence on empty-state effectiveness",
]

// Fan out: one research agent per angle. Results stay in this variable,
// not in the main conversation context.
const angleReports = await Promise.all(
  angles.map((angle) =>
    agent(
      "Search the web for: " + angle + ". Fetch the 3-5 strongest sources. " +
        "Return a list of claims, each with the exact source URL and a short quote.",
      { model: "sonnet" }
    )
  )
)

// Cross-check: a separate agent votes on every claim against all angle reports.
const verified = await agent(
  "Here are claims from 7 research angles. Mark each claim as verified " +
    "(2+ independent sources or 1 primary source), unverified, or contradicted. " +
    "List contradictions explicitly.\n\n" + JSON.stringify(angleReports),
  { model: "opus" }
)

// Synthesis: only verified claims reach the report.
const report = await agent(
  "Write the empty-states research report using only verified claims. " +
    "Cite every claim. Include a dropped-claims appendix.\n\n" + verified,
  { model: "opus" }
)

return report

Section 8

Step by step through one run

Frame the question and write down the decision it informs. Choose the angles. Run /deep-research or paste the workflow prompt. While it runs, do something else; a serious run takes 20 to 45 minutes because fetching and cross-checking real pages takes time.

When the report comes back, read the dropped-claims appendix first. It tells you what the internet asserts but cannot support, which is often the most useful page in the whole report. Then read the verified claims and spot-check three to five citations yourself by opening the URLs.

Finally, turn the report into a research packet the team can brief from: the question, the verified claims, the comparison table, the open questions, and the decision the team made because of it.

diagramDeep research review loop
Step 1

Run workflow

Step 2

Read dropped claims

Step 3

Spot-check citations

Step 4

Build packet

Step 5

Decide

Step 6

Log open questions

feeds next cycle

Each report goes through human review before it becomes a packet, and open questions feed the next run.

Section 9

Case study: empty states across five competitors

A team redesigning an analytics dashboard ran the empty-states question across Linear, Asana, Monday.com, ClickUp, and Notion before sketching anything. The run took 34 minutes, fetched 41 sources, and produced 28 verified claims and 9 dropped ones.

The verified findings were specific: four of the five products show a guided setup checklist rather than a blank canvas, three pre-populate with sample or template data, and only one relies on an illustration with a single call to action. Two claims the team had assumed true, including one about a competitor auto-importing data on first run, were dropped because no source after 2023 supported them.

The packet shaped the redesign directly. The team chose a checklist-plus-sample-data first run and skipped a debate about illustrations entirely, because the evidence showed illustrations were the least common pattern among the products their users already use.

Section 10

Case study: pricing-page patterns and conversion claims

A second team researched pricing-page patterns before a pricing redesign. This question is a trap for shallow research because conversion claims spread fast and are rarely sourced. The cross-checking step is what makes the workflow worth running here.

The run found 14 widely repeated claims about pricing pages. Only 6 survived cross-checking; the rest traced back to a single blog post quoting an experiment with no published data, or to articles citing each other in a circle. The familiar claim that highlighting a recommended tier lifts conversion by a precise percentage was dropped, because every source led back to one vendor case study.

The surviving claims were modest but usable: anchoring with a recommended tier is near-universal among the researched competitors, annual-versus-monthly toggles are standard, and the strongest published evidence concerned clarity of what is included per tier rather than visual treatments. The team designed for clarity first and treated the rest as things to test, not truths to inherit.

Section 11

Case study: standards before designing a date picker

A third team needed to design a date picker for a booking flow and ran a standards question rather than a competitive one: what do WCAG, the ARIA Authoring Practices, and the iOS and Android platform guidelines require or recommend for date input, and where do they conflict?

The report came back with the relevant WCAG success criteria cited by number, the ARIA dialog date-picker pattern, both platforms' preference for native pickers on mobile, and a clearly flagged tension: several accessibility practitioners recommend a plain text input with format hints as the most accessible baseline, with the calendar widget as an enhancement.

Because the citations pointed at primary sources, the accessibility lead could verify the load-bearing claims in ten minutes. The team shipped a text-first input with a calendar enhancement and documented the WCAG criteria it satisfies, which later shortened an enterprise customer's accessibility review.

Section 12

Good vs bad research output

The difference between a useful report and a confident-sounding one is traceability. Every claim that will influence a design decision needs a source you could open and check, and the report should be honest about what it could not verify.

tableResearch output quality comparison
1Bad

Most SaaS products use onboarding checklists for empty states

2Good

4 of the 5 researched products show a setup checklist on first run, with citations to current docs or teardowns for each

3Bad

Highlighting the recommended plan increases conversion by 36%

4Good

The 36% claim traces to a single vendor case study from 2019 and was dropped; anchoring a recommended tier is common but its effect is unproven in public sources

5Bad

WCAG requires a calendar widget to be keyboard accessible

6Good

WCAG 2.1 SC 2.1.1 requires keyboard operability for any widget used; the ARIA APG date picker dialog pattern documents the expected keyboard behavior, with links to both

A useful report makes claims you can check and names the ones it dropped.

Section 13

Limits: what this workflow cannot prove

Deep research describes the public web, not your users. It can tell you what competitors ship and what standards require; it cannot tell you whether your users will understand your design, and it cannot generate statistical claims about your product. Treat any quantitative claim from the web as context, not as a forecast for your situation.

Cross-checking reduces repeated misinformation, but it cannot detect a claim that is wrong everywhere it appears, and it inherits whatever bias exists in what gets published in English on the public web. Paywalled research, internal data, and non-English sources are systematically underrepresented.

Humans still own the judgment calls: whether the evidence is strong enough to act on, whether the comparison set was the right one, and whether following a convention is actually right for your product. The report informs the decision; it is not the decision.

  • Cannot replace primary research with your own users.
  • Cannot validate conversion or behavior claims for your product without your own testing.
  • Cannot see paywalled, internal, or recently changed product surfaces reliably.
  • Cannot decide whether to follow or break a convention; that is a design judgment.

Section 14

Make it a reusable team command

Once the prompt structure works, save it to .claude/workflows/ so the whole team runs research the same way. The saved workflow becomes a slash command; designers fill in the question and angles, and the cross-checking rules come along for free.

Use this checklist as the spine of the saved workflow and of every run you do by hand.

Deep design research workflow
1. Write the research question, the decision it informs, and what would change your mind.
2. Choose 5-8 angles: competitors, the primary standard, criticism, and recency.
3. Run /deep-research (or a saved workflow) with the question and angles.
4. Read the dropped-claims appendix before the findings.
5. Spot-check 3-5 citations by opening the sources yourself.
6. Build the research packet: question, verified claims, comparison table, open questions.
7. Record the decision the team made and why.
8. Log open questions as candidates for usability testing or the next research run.

Sources

Sources & further reading

Browse the full library on the workflows page or open the code samples in the public repository.

Newsletter

Get new research workflow templates and the next deep-research prompt library by email.

The newsletter is the update channel for article revisions, tool changes, and field-tested workflows.

Processed by Buttondown. You can unsubscribe from any email.

Further reading

For deeper reading, see The Agentic Designer and Claude Code for Designers.

The Agentic Designer cover
Curriculum
The Agentic Designer
How AI agents are transforming product design.

The operating model for product designers, design leads, and builders who need to understand what changes when agents join design work.

Claude Code for Designers cover
Curriculum
Claude Code for Designers
A designer's guide to AI-assisted workflows.

A practical guide for designers who want to work directly with coding agents without turning it into a programming manual.