AAgentic Design School
Research workflow
Intermediate

User Research Synthesis

A workflow for turning a folder of interview transcripts, support tickets, or survey answers into coded themes, evidence-linked insights, and an opportunity map, with one agent per transcript and a challenge agent hunting for over-claims.

OrchestrationDynamic workflow with per-transcript agents

Typical run1–2 hours for 10–15 transcripts

Last reviewed2026-06-02

View the code samples on GitHub

Section 1

Why synthesis is where research goes to die

Most teams are good at collecting research and bad at digesting it. Twelve interviews produce nine hours of recordings, a folder of transcripts, and a synthesis session that gets postponed twice and then compressed into an afternoon of sticky notes. The themes that emerge are real, but they lean on whoever talked loudest in the room and whichever interview happened most recently.

The slow part is not insight, it is coding: reading every transcript carefully, tagging what each participant actually said, and keeping the link between a claim and the person who made it. That is exactly the kind of patient, repetitive reading that agents do well and humans do unevenly at the end of a long week.

This workflow uses agents for the coding and the first-pass synthesis, and keeps humans for what the data means. The output is a set of themes where every insight links back to participant IDs and verbatim quotes, plus an explicit list of where the evidence is thin.

Section 2

When to reach for this workflow

Use it when you have ten or more transcripts, a few hundred support tickets, or a survey with long free-text answers, and the team needs themes it can defend. It works for interview studies, diary studies, churn calls, and ticket mining alike, because all of them reduce to the same job: code the raw material, merge the codes into themes, and keep the evidence attached.

Do not use it as a substitute for reading. The lead researcher should still read at least three full transcripts before the run, and spot-read afterwards. The agent accelerates coding; it does not replace familiarity with the data.

  • 10–15 interview transcripts that need coding within a week.
  • Large ticket or survey exports where manual coding would take days.
  • Mixed evidence (interviews plus tickets plus survey comments) that needs one merged theme set.
  • Re-coding an old study against a new codebook to compare across time.

Section 3

Privacy comes before any agent sees the data

Anonymize before the agents touch anything. Replace names with participant IDs, strip employers, email addresses, and any health or financial details that are not the subject of the study. Keep the mapping between IDs and real identities in a separate file that never enters the agent's working folder.

Check your consent forms. If participants consented to recordings being analyzed by your team, decide explicitly whether automated analysis is within that consent and whether your tooling agreement keeps the data out of model training. This is a research-ethics decision a human owns, not a default to drift into.

anonymise-transcripts.mjs (run before any agent work)
import { readdir, readFile, writeFile } from "node:fs/promises"

// Reads transcripts from ./raw, writes anonymized copies to ./transcripts,
// and writes the ID mapping to a file that stays OUT of the agent workspace.
const replacements = [
  { pattern: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi, token: "[EMAIL]" },
  { pattern: /\b(\+?\d[\d\s().-]{7,}\d)\b/g, token: "[PHONE]" },
]

const names = JSON.parse(await readFile("./participant-names.json", "utf8"))
// e.g. { "Sarah Chen": "P01", "Miguel Torres": "P02", ... }

const files = (await readdir("./raw")).filter((f) => f.endsWith(".txt"))
for (const file of files) {
  let text = await readFile("./raw/" + file, "utf8")
  for (const [name, id] of Object.entries(names)) {
    text = text.replaceAll(name, id)
    text = text.replaceAll(name.split(" ")[0], id)
  }
  for (const { pattern, token } of replacements) {
    text = text.replace(pattern, token)
  }
  await writeFile("./transcripts/" + file, text)
}

console.log("Anonymized " + files.length + " transcripts into ./transcripts")

Section 4

The orchestration pattern: one agent per transcript

This runs as a Claude Code dynamic workflow: a JavaScript script that Claude writes and runs in the background, orchestrating subagents while intermediate results stay in script variables rather than in Claude's own context. That separation matters here, because fifteen transcripts would otherwise blow past any usable context window and the later transcripts would be read more carelessly than the first.

Each transcript gets its own coding agent working from the same shared codebook, so every participant gets the same quality of attention. Workflows can run up to 16 agents concurrently and up to 1,000 in a run, and a run is resumable, which matters when ticket mining involves hundreds of small coding jobs. You trigger the workflow by including the word workflow in the prompt or via /effort ultracode, and you can save the finished prompt to .claude/workflows/ as a reusable command like /synthesize.

After the per-transcript pass, a synthesis agent merges codes into themes, and a separate challenge agent goes looking for disconfirming evidence and over-claims before anything reaches the team.

diagramSynthesis pipeline stages
1

Design decision

Anonymize

2

Design decision

Agree codebook

3

Design decision

Code per transcript

4

Design decision

Merge themes

5

Design decision

Challenge pass

6

Design decision

Human review

7

Design decision

Opportunity map

Anonymized transcripts are coded in parallel, merged into themes, then challenged before humans review.

Section 5

Build the codebook before the run

The codebook is the contract between you and the coding agents. Start with the codes your research questions imply, leave room for emergent codes, and define each code in one sentence with an example quote. A vague codebook produces themes that sound like the codebook instead of the data.

Keep it short. Eight to fifteen codes is workable; thirty codes means every quote matches something and nothing is learned.

codebook.md (excerpt for the churn study)
# Codebook: B2B churn interviews, June 2026

Apply codes to verbatim quotes only. A quote may carry multiple codes.
If nothing fits, propose a new code prefixed NEW- and define it.

## ONB-CONFUSION
Participant describes not understanding what to do during setup or first use.
Example: "I honestly didn't know which workspace I was supposed to be in."

## VALUE-GAP
Participant says the product did not deliver the outcome they bought it for.
Example: "We never got the reporting our exec team was promised."

## CHAMPION-LEFT
The internal champion changed roles or left, and usage decayed afterwards.

## PRICE-TRIGGER
Price or renewal terms named as the proximate reason for leaving, not the root cause.

## WORKAROUND
Participant describes building spreadsheets or scripts to do what the product should do.

## SUPPORT-FRICTION
Slow, repetitive, or dead-end support experiences.

Section 6

The workflow prompt

Point the workflow at the anonymized folder and the codebook, and be explicit about the two rules that protect the output: quotes must be verbatim with participant IDs, and the agent must never invent or paraphrase a quote and present it as one.

User research synthesis workflow prompt
Run this as a workflow.

Input: ./transcripts contains 12 anonymized churn interview transcripts (P01-P12). ./codebook.md is the shared codebook.

Stage 1 - Coding: For each transcript, launch one agent that reads only that transcript and the codebook. It returns coded excerpts: { participant, code, verbatim_quote, line_reference }. Quotes must be copied exactly from the transcript. If a quote cannot be found verbatim in the file, it must not be reported. Propose NEW- codes where nothing fits.

Stage 2 - Synthesis: Merge the coded excerpts into themes. Each theme needs: a one-sentence claim, the codes it draws on, the participants who support it (by ID), 2-4 verbatim quotes, and a count of how many of the 12 participants it covers.

Stage 3 - Challenge: A separate agent reviews the themes against all coded excerpts and flags: themes supported by fewer than 3 participants, quotes that do not appear verbatim in the source transcripts, claims that generalize beyond the sample, and any disconfirming evidence the synthesis ignored.

Output: themes.md, evidence-table.md (every quote with participant ID and transcript line), challenges.md, and opportunities.md mapping themes to candidate product opportunities. Do not include any opportunity that is not linked to at least one theme.

Section 7

What the orchestration script roughly does

As with deep research, Claude writes the orchestration script itself. The sketch below shows the shape with an agent() pseudo-API so you know where the guardrails sit. It is illustrative, not the literal generated code.

Dynamic workflow sketch (pseudo-code)
const fs = require("node:fs")
const transcripts = fs.readdirSync("./transcripts").filter((f) => f.endsWith(".txt"))
const codebook = fs.readFileSync("./codebook.md", "utf8")

// Stage 1: one coding agent per transcript, run in parallel batches.
const coded = await Promise.all(
  transcripts.map((file) =>
    agent(
      "You are coding one interview transcript against a codebook.\n" +
        codebook +
        "\nTranscript file: " + file + "\n" +
        "Return JSON: [{ participant, code, verbatim_quote, line }]. " +
        "Quotes must appear verbatim in the transcript. Never paraphrase into quotes.",
      { model: "sonnet" }
    )
  )
)

// Stage 2: synthesis agent merges codes into themes with evidence attached.
const themes = await agent(
  "Merge these coded excerpts into themes. Every theme must list supporting " +
    "participant IDs, verbatim quotes, and a coverage count out of " +
    transcripts.length + ".\n\n" + JSON.stringify(coded),
  { model: "opus" }
)

// Stage 3: challenge agent looks for over-claims and missing disconfirming evidence.
const challenges = await agent(
  "Act as a skeptical second researcher. Check each theme against the coded " +
    "excerpts. Flag thin evidence (<3 participants), quotes not found verbatim, " +
    "over-generalization, and ignored disconfirming evidence.\n\nThemes:\n" +
    themes + "\n\nExcerpts:\n" + JSON.stringify(coded),
  { model: "opus" }
)

fs.writeFileSync("./output/themes.md", themes)
fs.writeFileSync("./output/challenges.md", challenges)

Section 8

Step by step through one study

Anonymize the raw material and check consent. Read three transcripts yourself and draft the codebook. Run the workflow and let the coding, synthesis, and challenge stages finish; for 10 to 15 transcripts the run usually lands inside an hour, and your total time including review is one to two hours.

Review the challenge file before the themes. Then spot-check evidence: pick five quotes at random from the evidence table and confirm they exist verbatim in the named transcript. If even one is fabricated or altered, treat the whole evidence table as suspect and rerun the coding stage with a stricter prompt.

Finish by doing the human work: decide which themes matter, write the opportunity map with the team, and record which themes were set aside as thin. The agent's coverage counts make that conversation concrete instead of rhetorical.

diagramSynthesis review loop
Step 1

Code transcripts

Step 2

Merge themes

Step 3

Challenge pass

Step 4

Spot-check quotes

Step 5

Team review

Step 6

Opportunity map

feeds next cycle

The challenge pass and the human spot-check sit between coding and anything the team acts on.

Section 9

Case study: 12 churn interviews for a B2B product

A product team ran 12 exit interviews with churned accounts and used this workflow to synthesize them in an afternoon instead of the usual two weeks. The coding stage produced 214 coded excerpts; synthesis proposed 9 themes; the challenge agent demoted 3 of them.

The strongest theme was not price, even though sales believed it was. Eight of twelve participants described a value gap traced to onboarding: the reporting feature they bought the product for was never configured, and by renewal time nobody internal knew how. Price appeared in seven interviews but, in five of them, only after the value gap was described, which the coding made visible because PRICE-TRIGGER and VALUE-GAP quotes sat side by side with line references.

One demoted theme is just as instructive: synthesis initially claimed customers wanted an integration with a specific CRM, but the challenge pass showed only two participants mentioned it and one was speculating about a colleague's needs. The team kept it as a question for the next study rather than a roadmap item.

Section 10

Case study: 400 support tickets mined for onboarding friction

A second team pointed the same pattern at a Zendesk export of roughly 400 tickets tagged onboarding. Tickets are shorter than interviews, so the workflow batched them 20 per coding agent, which kept the run to about 70 minutes including the challenge pass.

The merged themes quantified what the team had only sensed: 31 percent of coded tickets involved confusion between inviting a teammate and transferring ownership, and 18 percent involved an empty dashboard after a completed import, where the data had landed in a different workspace. Both came with ticket IDs attached, so designers could read the original wording instead of trusting a summary.

The challenge pass flagged a real bias: tickets only represent people who bothered to write in, and the workspace-confusion numbers could not be read as a rate across all new users. The team paired the findings with product analytics before sizing the fix, and the redesign of the invite flow cut tickets in that category by roughly half over the following quarter.

Section 11

Case study: a diary study about a mobile banking app

A third team ran a two-week diary study with 9 participants logging every interaction with their banking app, producing about 340 short entries plus exit interviews. Diary entries are fragmentary, so the codebook leaned on moments (checking before a purchase, payday, an unexpected charge) rather than features.

Synthesis surfaced a pattern the team had not designed for: 6 of 9 participants opened the app most often in a checking-before-spending moment that lasted under ten seconds, and several described anxiety when the balance was slow to load or buried behind a promotion interstitial. The evidence table linked each claim to dated diary entries, which let the team see the pattern repeat across days rather than relying on what participants remembered in the exit interview.

The opportunity map prioritized a glanceable balance state over the planned budgeting feature for that quarter. The challenge file noted, correctly, that 9 participants cannot establish how common the behavior is in the user base, so the team validated the prevalence with analytics before committing the roadmap.

Section 12

Good vs bad synthesis output

The single test that separates trustworthy synthesis from plausible fiction is traceability. Every insight should survive the question: which participants, which quotes, which lines? An agent under pressure to be helpful will otherwise produce themes that sound like your product strategy and quotes nobody ever said.

tableSynthesis quality comparison
1Bad

Users are frustrated with onboarding and want a simpler experience

2Good

8 of 12 participants (P01, P03-P07, P10, P12) described not configuring reporting during onboarding; 4 verbatim quotes attached with line references

3Bad

One customer said: the onboarding was a complete nightmare from start to finish (no participant ID, not in any transcript)

4Good

P04, line 211: "I honestly didn't know which workspace I was supposed to be in"

5Bad

Customers churn primarily because of pricing

6Good

Price was named in 7 of 12 interviews, but in 5 it followed a described value gap; evidence table lists both codes per participant

7Bad

Users want a Salesforce integration

8Good

2 of 12 participants mentioned a CRM integration, one speculatively; flagged by the challenge pass as thin evidence

Traceable insights can be checked against transcripts; invented themes cannot.

Section 13

Limits: what this workflow cannot prove

Qualitative synthesis describes patterns in a small sample; it does not produce statistics. Coverage counts like eight of twelve are honest descriptions of the data, not estimates of how common something is among all users, and the workflow should never be used to generate percentage claims for a board deck without quantitative backing.

Agents inherit the biases in the sample and add some of their own: they over-merge similar codes, they prefer tidy narratives, and without the verbatim rule they will smooth quotes into something nobody said. The challenge pass reduces these failures; it does not eliminate them, which is why the human spot-check of random quotes is a fixed step, not an optional one.

Humans own the ethics and the meaning. Consent, anonymization, whether a theme justifies a roadmap change, and how findings are represented to stakeholders are decisions the researcher makes and signs.

  • Cannot turn 12 interviews into statistical claims about the user base.
  • Cannot detect sample bias on its own; it only analyzes who you talked to.
  • Cannot decide consent and privacy questions; those are settled before the run.
  • Cannot judge which opportunity is worth building; it can only show the evidence.

Section 14

The reusable synthesis workflow

Save the prompt, the codebook template, and the agent definitions in the repo so the next study starts from a known-good setup. Saved to .claude/workflows/, the whole thing becomes a /synthesize command the team can rerun on every study with a consistent evidence standard.

User research synthesis workflow
1. Anonymize transcripts and confirm consent covers automated analysis.
2. Read 3 transcripts yourself and draft the codebook.
3. Run the workflow: one coding agent per transcript against the shared codebook.
4. Merge codes into themes with participant IDs, verbatim quotes, and coverage counts.
5. Run the challenge pass for thin evidence, over-claims, and disconfirming data.
6. Spot-check 5 random quotes against the source transcripts.
7. Review themes with the team and build the opportunity map from supported themes only.
8. Archive the codebook, evidence table, and decisions for the next study.

Sources

Sources & further reading

Browse the full library on the workflows page or open the code samples in the public repository.

Newsletter

Get the synthesis codebook templates and new research-ops checklists by email.

The newsletter is the update channel for article revisions, tool changes, and field-tested workflows.

Processed by Buttondown. You can unsubscribe from any email.

Further reading

For deeper reading, see The Agentic Designer and Claude Code for Designers.

The Agentic Designer cover
Curriculum
The Agentic Designer
How AI agents are transforming product design.

The operating model for product designers, design leads, and builders who need to understand what changes when agents join design work.

Claude Code for Designers cover
Curriculum
Claude Code for Designers
A designer's guide to AI-assisted workflows.

A practical guide for designers who want to work directly with coding agents without turning it into a programming manual.