Section 1
Why synthesis is where research goes to die
Most teams are good at collecting research and bad at digesting it. Twelve interviews produce nine hours of recordings, a folder of transcripts, and a synthesis session that gets postponed twice and then compressed into an afternoon of sticky notes. The themes that emerge are real, but they lean on whoever talked loudest in the room and whichever interview happened most recently.
The slow part is not insight, it is coding: reading every transcript carefully, tagging what each participant actually said, and keeping the link between a claim and the person who made it. That is exactly the kind of patient, repetitive reading that agents do well and humans do unevenly at the end of a long week.
This workflow uses agents for the coding and the first-pass synthesis, and keeps humans for what the data means. The output is a set of themes where every insight links back to participant IDs and verbatim quotes, plus an explicit list of where the evidence is thin.
Section 2
When to reach for this workflow
Use it when you have ten or more transcripts, a few hundred support tickets, or a survey with long free-text answers, and the team needs themes it can defend. It works for interview studies, diary studies, churn calls, and ticket mining alike, because all of them reduce to the same job: code the raw material, merge the codes into themes, and keep the evidence attached.
Do not use it as a substitute for reading. The lead researcher should still read at least three full transcripts before the run, and spot-read afterwards. The agent accelerates coding; it does not replace familiarity with the data.
- 10–15 interview transcripts that need coding within a week.
- Large ticket or survey exports where manual coding would take days.
- Mixed evidence (interviews plus tickets plus survey comments) that needs one merged theme set.
- Re-coding an old study against a new codebook to compare across time.
Section 3
Privacy comes before any agent sees the data
Anonymize before the agents touch anything. Replace names with participant IDs, strip employers, email addresses, and any health or financial details that are not the subject of the study. Keep the mapping between IDs and real identities in a separate file that never enters the agent's working folder.
Check your consent forms. If participants consented to recordings being analyzed by your team, decide explicitly whether automated analysis is within that consent and whether your tooling agreement keeps the data out of model training. This is a research-ethics decision a human owns, not a default to drift into.
import { readdir, readFile, writeFile } from "node:fs/promises"
// Reads transcripts from ./raw, writes anonymized copies to ./transcripts,
// and writes the ID mapping to a file that stays OUT of the agent workspace.
const replacements = [
{ pattern: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi, token: "[EMAIL]" },
{ pattern: /\b(\+?\d[\d\s().-]{7,}\d)\b/g, token: "[PHONE]" },
]
const names = JSON.parse(await readFile("./participant-names.json", "utf8"))
// e.g. { "Sarah Chen": "P01", "Miguel Torres": "P02", ... }
const files = (await readdir("./raw")).filter((f) => f.endsWith(".txt"))
for (const file of files) {
let text = await readFile("./raw/" + file, "utf8")
for (const [name, id] of Object.entries(names)) {
text = text.replaceAll(name, id)
text = text.replaceAll(name.split(" ")[0], id)
}
for (const { pattern, token } of replacements) {
text = text.replace(pattern, token)
}
await writeFile("./transcripts/" + file, text)
}
console.log("Anonymized " + files.length + " transcripts into ./transcripts")Section 4
The orchestration pattern: one agent per transcript
This runs as a Claude Code dynamic workflow: a JavaScript script that Claude writes and runs in the background, orchestrating subagents while intermediate results stay in script variables rather than in Claude's own context. That separation matters here, because fifteen transcripts would otherwise blow past any usable context window and the later transcripts would be read more carelessly than the first.
Each transcript gets its own coding agent working from the same shared codebook, so every participant gets the same quality of attention. Workflows can run up to 16 agents concurrently and up to 1,000 in a run, and a run is resumable, which matters when ticket mining involves hundreds of small coding jobs. You trigger the workflow by including the word workflow in the prompt or via /effort ultracode, and you can save the finished prompt to .claude/workflows/ as a reusable command like /synthesize.
After the per-transcript pass, a synthesis agent merges codes into themes, and a separate challenge agent goes looking for disconfirming evidence and over-claims before anything reaches the team.
Design decision
Anonymize
Design decision
Agree codebook
Design decision
Code per transcript
Design decision
Merge themes
Design decision
Challenge pass
Design decision
Human review
Design decision
Opportunity map
Anonymized transcripts are coded in parallel, merged into themes, then challenged before humans review.
Section 5
Build the codebook before the run
The codebook is the contract between you and the coding agents. Start with the codes your research questions imply, leave room for emergent codes, and define each code in one sentence with an example quote. A vague codebook produces themes that sound like the codebook instead of the data.
Keep it short. Eight to fifteen codes is workable; thirty codes means every quote matches something and nothing is learned.
# Codebook: B2B churn interviews, June 2026 Apply codes to verbatim quotes only. A quote may carry multiple codes. If nothing fits, propose a new code prefixed NEW- and define it. ## ONB-CONFUSION Participant describes not understanding what to do during setup or first use. Example: "I honestly didn't know which workspace I was supposed to be in." ## VALUE-GAP Participant says the product did not deliver the outcome they bought it for. Example: "We never got the reporting our exec team was promised." ## CHAMPION-LEFT The internal champion changed roles or left, and usage decayed afterwards. ## PRICE-TRIGGER Price or renewal terms named as the proximate reason for leaving, not the root cause. ## WORKAROUND Participant describes building spreadsheets or scripts to do what the product should do. ## SUPPORT-FRICTION Slow, repetitive, or dead-end support experiences.
Section 6
The workflow prompt
Point the workflow at the anonymized folder and the codebook, and be explicit about the two rules that protect the output: quotes must be verbatim with participant IDs, and the agent must never invent or paraphrase a quote and present it as one.
Run this as a workflow.
Input: ./transcripts contains 12 anonymized churn interview transcripts (P01-P12). ./codebook.md is the shared codebook.
Stage 1 - Coding: For each transcript, launch one agent that reads only that transcript and the codebook. It returns coded excerpts: { participant, code, verbatim_quote, line_reference }. Quotes must be copied exactly from the transcript. If a quote cannot be found verbatim in the file, it must not be reported. Propose NEW- codes where nothing fits.
Stage 2 - Synthesis: Merge the coded excerpts into themes. Each theme needs: a one-sentence claim, the codes it draws on, the participants who support it (by ID), 2-4 verbatim quotes, and a count of how many of the 12 participants it covers.
Stage 3 - Challenge: A separate agent reviews the themes against all coded excerpts and flags: themes supported by fewer than 3 participants, quotes that do not appear verbatim in the source transcripts, claims that generalize beyond the sample, and any disconfirming evidence the synthesis ignored.
Output: themes.md, evidence-table.md (every quote with participant ID and transcript line), challenges.md, and opportunities.md mapping themes to candidate product opportunities. Do not include any opportunity that is not linked to at least one theme.Section 7
What the orchestration script roughly does
As with deep research, Claude writes the orchestration script itself. The sketch below shows the shape with an agent() pseudo-API so you know where the guardrails sit. It is illustrative, not the literal generated code.
const fs = require("node:fs")
const transcripts = fs.readdirSync("./transcripts").filter((f) => f.endsWith(".txt"))
const codebook = fs.readFileSync("./codebook.md", "utf8")
// Stage 1: one coding agent per transcript, run in parallel batches.
const coded = await Promise.all(
transcripts.map((file) =>
agent(
"You are coding one interview transcript against a codebook.\n" +
codebook +
"\nTranscript file: " + file + "\n" +
"Return JSON: [{ participant, code, verbatim_quote, line }]. " +
"Quotes must appear verbatim in the transcript. Never paraphrase into quotes.",
{ model: "sonnet" }
)
)
)
// Stage 2: synthesis agent merges codes into themes with evidence attached.
const themes = await agent(
"Merge these coded excerpts into themes. Every theme must list supporting " +
"participant IDs, verbatim quotes, and a coverage count out of " +
transcripts.length + ".\n\n" + JSON.stringify(coded),
{ model: "opus" }
)
// Stage 3: challenge agent looks for over-claims and missing disconfirming evidence.
const challenges = await agent(
"Act as a skeptical second researcher. Check each theme against the coded " +
"excerpts. Flag thin evidence (<3 participants), quotes not found verbatim, " +
"over-generalization, and ignored disconfirming evidence.\n\nThemes:\n" +
themes + "\n\nExcerpts:\n" + JSON.stringify(coded),
{ model: "opus" }
)
fs.writeFileSync("./output/themes.md", themes)
fs.writeFileSync("./output/challenges.md", challenges)Section 8
Step by step through one study
Anonymize the raw material and check consent. Read three transcripts yourself and draft the codebook. Run the workflow and let the coding, synthesis, and challenge stages finish; for 10 to 15 transcripts the run usually lands inside an hour, and your total time including review is one to two hours.
Review the challenge file before the themes. Then spot-check evidence: pick five quotes at random from the evidence table and confirm they exist verbatim in the named transcript. If even one is fabricated or altered, treat the whole evidence table as suspect and rerun the coding stage with a stricter prompt.
Finish by doing the human work: decide which themes matter, write the opportunity map with the team, and record which themes were set aside as thin. The agent's coverage counts make that conversation concrete instead of rhetorical.
Code transcripts
Merge themes
Challenge pass
Spot-check quotes
Team review
Opportunity map
feeds next cycleThe challenge pass and the human spot-check sit between coding and anything the team acts on.
Section 9
Case study: 12 churn interviews for a B2B product
A product team ran 12 exit interviews with churned accounts and used this workflow to synthesize them in an afternoon instead of the usual two weeks. The coding stage produced 214 coded excerpts; synthesis proposed 9 themes; the challenge agent demoted 3 of them.
The strongest theme was not price, even though sales believed it was. Eight of twelve participants described a value gap traced to onboarding: the reporting feature they bought the product for was never configured, and by renewal time nobody internal knew how. Price appeared in seven interviews but, in five of them, only after the value gap was described, which the coding made visible because PRICE-TRIGGER and VALUE-GAP quotes sat side by side with line references.
One demoted theme is just as instructive: synthesis initially claimed customers wanted an integration with a specific CRM, but the challenge pass showed only two participants mentioned it and one was speculating about a colleague's needs. The team kept it as a question for the next study rather than a roadmap item.
Section 10
Case study: 400 support tickets mined for onboarding friction
A second team pointed the same pattern at a Zendesk export of roughly 400 tickets tagged onboarding. Tickets are shorter than interviews, so the workflow batched them 20 per coding agent, which kept the run to about 70 minutes including the challenge pass.
The merged themes quantified what the team had only sensed: 31 percent of coded tickets involved confusion between inviting a teammate and transferring ownership, and 18 percent involved an empty dashboard after a completed import, where the data had landed in a different workspace. Both came with ticket IDs attached, so designers could read the original wording instead of trusting a summary.
The challenge pass flagged a real bias: tickets only represent people who bothered to write in, and the workspace-confusion numbers could not be read as a rate across all new users. The team paired the findings with product analytics before sizing the fix, and the redesign of the invite flow cut tickets in that category by roughly half over the following quarter.
Section 11
Case study: a diary study about a mobile banking app
A third team ran a two-week diary study with 9 participants logging every interaction with their banking app, producing about 340 short entries plus exit interviews. Diary entries are fragmentary, so the codebook leaned on moments (checking before a purchase, payday, an unexpected charge) rather than features.
Synthesis surfaced a pattern the team had not designed for: 6 of 9 participants opened the app most often in a checking-before-spending moment that lasted under ten seconds, and several described anxiety when the balance was slow to load or buried behind a promotion interstitial. The evidence table linked each claim to dated diary entries, which let the team see the pattern repeat across days rather than relying on what participants remembered in the exit interview.
The opportunity map prioritized a glanceable balance state over the planned budgeting feature for that quarter. The challenge file noted, correctly, that 9 participants cannot establish how common the behavior is in the user base, so the team validated the prevalence with analytics before committing the roadmap.
Section 12
Good vs bad synthesis output
The single test that separates trustworthy synthesis from plausible fiction is traceability. Every insight should survive the question: which participants, which quotes, which lines? An agent under pressure to be helpful will otherwise produce themes that sound like your product strategy and quotes nobody ever said.
Users are frustrated with onboarding and want a simpler experience
8 of 12 participants (P01, P03-P07, P10, P12) described not configuring reporting during onboarding; 4 verbatim quotes attached with line references
One customer said: the onboarding was a complete nightmare from start to finish (no participant ID, not in any transcript)
P04, line 211: "I honestly didn't know which workspace I was supposed to be in"
Customers churn primarily because of pricing
Price was named in 7 of 12 interviews, but in 5 it followed a described value gap; evidence table lists both codes per participant
Users want a Salesforce integration
2 of 12 participants mentioned a CRM integration, one speculatively; flagged by the challenge pass as thin evidence
Traceable insights can be checked against transcripts; invented themes cannot.
Section 13
Limits: what this workflow cannot prove
Qualitative synthesis describes patterns in a small sample; it does not produce statistics. Coverage counts like eight of twelve are honest descriptions of the data, not estimates of how common something is among all users, and the workflow should never be used to generate percentage claims for a board deck without quantitative backing.
Agents inherit the biases in the sample and add some of their own: they over-merge similar codes, they prefer tidy narratives, and without the verbatim rule they will smooth quotes into something nobody said. The challenge pass reduces these failures; it does not eliminate them, which is why the human spot-check of random quotes is a fixed step, not an optional one.
Humans own the ethics and the meaning. Consent, anonymization, whether a theme justifies a roadmap change, and how findings are represented to stakeholders are decisions the researcher makes and signs.
- Cannot turn 12 interviews into statistical claims about the user base.
- Cannot detect sample bias on its own; it only analyzes who you talked to.
- Cannot decide consent and privacy questions; those are settled before the run.
- Cannot judge which opportunity is worth building; it can only show the evidence.
Section 14
The reusable synthesis workflow
Save the prompt, the codebook template, and the agent definitions in the repo so the next study starts from a known-good setup. Saved to .claude/workflows/, the whole thing becomes a /synthesize command the team can rerun on every study with a consistent evidence standard.
1. Anonymize transcripts and confirm consent covers automated analysis. 2. Read 3 transcripts yourself and draft the codebook. 3. Run the workflow: one coding agent per transcript against the shared codebook. 4. Merge codes into themes with participant IDs, verbatim quotes, and coverage counts. 5. Run the challenge pass for thin evidence, over-claims, and disconfirming data. 6. Spot-check 5 random quotes against the source transcripts. 7. Review themes with the team and build the opportunity map from supported themes only. 8. Archive the codebook, evidence table, and decisions for the next study.
Sources

