AAgentic Design School
Module 3 of 6
45–55 minutes

Agentic Design Research

Research Synthesis with Agents

Synthesis across interviews, support tickets, reviews, and open-text feedback: theming with traceable evidence, contradiction-hunting, and the human interpretation pass that decides what the themes mean.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Prepare qualitative data so an agent can theme it without flattening nuance.
  • Require every theme to trace back to specific quotes and sources.
  • Run the human interpretation pass: what the themes mean and what to do about them.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Research Synthesis with Agents

Agentic Design Research · Module 3 of 6

  • Synthesis is two jobs: sorting the material, and deciding what it means
  • Agents do the sorting well; the meaning stays with the researcher
  • Every theme must trace back to verbatim quotes and sources
  • Contradictions get hunted on purpose, not discovered by accident

This module is about delegating the coding without delegating the conclusion.

Slide notes

The framing sentence for the whole module is on the slide: synthesis is two jobs. The first job is sorting — reading every transcript, ticket, and review carefully, tagging what each person actually said, and keeping the link between a claim and its source. The second job is meaning — deciding which patterns matter, what they imply for the product, and what the team should do differently. Most synthesis sessions blur the two jobs together in one tired afternoon of sticky notes, which is why the loudest interview and the most recent one carry more weight than they should.

Agents change the economics of the first job dramatically and barely touch the second. A folder of fifteen transcripts that would take a week to code carefully can be coded in an hour, with every quote attributed and every theme counted. But the agent has no idea whether a theme justifies changing the roadmap, whether the sample can support the claim, or which finding the team is politically unable to hear yet. That is the researcher's work, and this module keeps it visibly separate.

If participants did Modules 1 and 2, connect back: the research packet structure from Module 1 is where the synthesis output lands, and the observation-versus-inference discipline from Module 2 reappears here as the line between coded evidence and interpreted meaning.

Narration for this slide

Welcome to Module 3. This one is about synthesis — the step where research most often goes to die. Twelve interviews, nine hours of recordings, a synthesis session that gets postponed twice and then squeezed into an afternoon of sticky notes. Here is the framing for the whole module: synthesis is two jobs. Sorting — coding every transcript and ticket and keeping the evidence attached. And meaning — deciding what the patterns imply and what to do about them. Agents are very good at the first job and have no business doing the second. By the end of this module you will know how to delegate the coding without delegating the conclusion.

Slide 2 of 1316:9

Preparing the corpus: privacy comes first

Nothing reaches an agent until the researcher has settled consent and stripped identities. This is a research-ethics decision, not a default to drift into.

  • Anonymise: names become participant IDs; emails, employers, and incidental health or financial details are stripped
  • The mapping between IDs and real identities lives outside the agent's working folder
  • Check consent: does "analysed by the team" cover automated analysis? Decide explicitly
  • Confirm your tooling agreement keeps research data out of model training
  • Agents work from transcripts and notes, never from raw recordings or video

If consent does not cover automated analysis, this workflow is not available to you — and that is the correct outcome.

Slide notes

This slide comes before any technique on purpose. The temptation with agentic synthesis is to point the tool at the raw export and start, because the raw export is right there. The discipline is that nothing enters the agent's working folder until names have become participant IDs, contact details and employers are stripped, and any health or financial detail that is not the subject of the study is removed. The mapping between IDs and identities is kept in a separate file that the agent never sees — a small mechanical habit that prevents a large class of accidents.

Consent is the harder question because it is a judgment, not a script. Most consent forms say something like recordings will be analysed by the research team. Whether that covers automated analysis by an agent is a decision the researcher makes explicitly, documents, and is prepared to defend — not something to discover the answer to after the run. The same goes for the tooling side: confirm that the agreement covering your agent platform keeps the data out of model training before the first transcript is uploaded, and date-stamp that check, because terms change.

The last bullet is practical as much as ethical: agents work from text. Transcripts and notes are what they can read carefully and quote verbatim from; recordings are not. Keeping agents away from video also keeps the most identifying material out of the pipeline entirely.

Narration for this slide

Before any technique, the part the researcher owns outright: privacy. Nothing from a study reaches an agent until it is anonymised. Names become participant IDs. Emails, employers, and incidental personal details are stripped. The file that maps IDs back to real people stays outside the agent's folder, full stop. Then consent. If participants agreed to their recordings being analysed by your team, you decide — explicitly, on the record — whether automated analysis sits inside that consent, and whether your tooling keeps the data out of model training. If the answer is no, this workflow is simply not available for that study. That is not a limitation to work around. That is the system working.

Slide 3 of 1316:9

The codebook is the contract

Coding agents are only as consistent as the codebook they share. Write it before the run, and keep it short.

  • Start from the codes your research questions imply; leave room for emergent NEW- codes
  • Define each code in one sentence, with one example quote
  • Eight to fifteen codes is workable; thirty means everything matches and nothing is learned
  • Read at least three sources yourself before drafting it — familiarity is not delegable
  • A vague codebook produces themes that sound like the codebook instead of the data

The codebook is where the researcher's framing enters the run. Skip it and the agent supplies its own framing, which is the average of everyone else's.

Slide notes

The codebook is the single highest-leverage artifact in this workflow, because it is the contract between the researcher and every coding agent. Each code gets a short name, a one-sentence definition, and an example quote, so that fifteen agents coding fifteen transcripts apply the same standard. Codes come first from the research questions — what would change a decision — and the codebook explicitly invites emergent codes, conventionally prefixed NEW-, so the framework does not blind the run to what the data actually contains.

Length matters more than people expect. Eight to fifteen codes is workable. Thirty codes means every quote matches something, the coding looks thorough, and the synthesis learns nothing, because the categories are too fine to accumulate weight. The other failure direction is vagueness: a code defined as user frustration will absorb half the corpus and produce a theme that says users are frustrated, which the team already believed walking in.

The non-delegable step is reading. The lead researcher reads at least three full transcripts — or a meaningful sample of tickets — before drafting the codebook, and spot-reads after the run. This is not ceremony. The codebook encodes what the researcher noticed, and a researcher who has not read the material has nothing to encode. The agent accelerates coding; it does not replace familiarity with the data.

Narration for this slide

The codebook is where your thinking enters the run. It is a short document: each code gets a name, a one-sentence definition, and an example quote. The codes come from your research questions, and you leave explicit room for new codes the agent proposes when nothing fits. Keep it to somewhere between eight and fifteen — thirty codes means everything matches something and you learn nothing. And here is the step you cannot delegate: read at least three transcripts yourself before you write it. The codebook captures what you noticed in the data. If you have not read the data, the agent will quietly supply its own framing — and its framing is the average of everyone else's research.

Slide 4 of 1316:9

The synthesis funnel

Five stages. The researcher prepares and interprets; agents code, cluster, and challenge in between — with the evidence trail intact the whole way.

Pipeline diagram of agentic research synthesis in five stages. Stage one, prepare: the researcher anonymises transcripts, tickets, and reviews, confirms consent, reads three sources, and drafts the codebook. Stage two, code: one agent per source applies the shared codebook, keeping verbatim quotes with participant IDs and line references. Stage three, cluster: an agent merges codes into themes, each with a one-sentence claim, supporting participants, quotes, and a coverage count. Stage four, challenge: a sceptical agent flags thin evidence, non-verbatim quotes, contradictions, and over-generalisation. Stage five, interpret: the researcher spot-checks quotes, decides meaning, sets confidence, parks thin themes, and signs the packet. A green band underneath shows the evidence trail linking every theme back to verbatim quotes, participant IDs, and line references.
Prepare and interpret are researcher-led; code, cluster, and challenge are agent-run. The green band is the property that makes the whole funnel trustworthy: every theme stays linked to its quotes and sources from end to end.

The funnel narrows the material; the evidence trail is what stops it narrowing into fiction.

Slide notes

Walk the funnel left to right and name the owner of each stage. Prepare is the researcher: anonymisation, consent, reading a sample, and drafting the codebook — everything covered on the previous two slides. Code is agent work, fanned out one agent per transcript or per batch of tickets, so the fifteenth source gets the same quality of attention as the first; the rules are verbatim quotes only, participant ID and line reference attached, and never a paraphrase presented as a quote. Cluster is the merge: themes with a one-sentence claim, the codes they draw on, the participants who support them, and an honest coverage count. Challenge is a separate agent acting as a sceptical second researcher, flagging thin evidence, quotes that do not appear verbatim in the sources, over-generalisation, and disconfirming evidence the synthesis ignored. Interpret is the researcher again: spot-check quotes, decide meaning, set confidence, park what is thin.

Point at the green band and stay on it. The evidence trail is the property that distinguishes this from a very fast opinion generator. At every stage, the link from claim back to verbatim quote, participant ID, and line reference is preserved, which is what makes the output auditable by someone who was not in the run.

In practice this runs as an orchestrated workflow — in Claude Code, a dynamic workflow where intermediate codings stay in script variables rather than the agent's own context, which is what keeps later transcripts from being read more carelessly than early ones. The mechanics matter less than the shape: human bookends, agent middle, evidence trail throughout.

Narration for this slide

Here is the whole module in one picture. Five stages. You prepare: anonymise, settle consent, read a sample, write the codebook. Agents code: one per transcript or batch of tickets, same codebook, verbatim quotes only, every quote carrying a participant ID and a line reference. An agent clusters the codes into themes — each theme gets a claim, its supporting participants, its quotes, and an honest count. A separate challenge agent then plays sceptical second researcher: thin evidence, invented quotes, contradictions the synthesis ignored. And then it comes back to you for interpretation. Notice the green band underneath. The evidence trail runs the full length of the funnel. That trail is the difference between synthesis and a fast, confident opinion.

Slide 5 of 1316:9

Theming with evidence: every claim points at quotes

A theme that cannot answer "which participants, which quotes, which lines" is not a finding. It is a guess wearing the formatting of one.

  • Each theme: a one-sentence claim, the codes it draws on, supporting participant IDs
  • Two to four verbatim quotes per theme, each with a line or timestamp reference
  • Quotes are copied exactly from the source — if it cannot be found verbatim, it is not reported
  • An evidence table lists every quote with its participant and location, separately from the themes
  • Spot-check: pick five quotes at random and confirm them in the sources before anyone acts on the themes

One fabricated quote means the whole evidence table is treated as suspect and the coding stage reruns. No exceptions.

Slide notes

Traceability is the single test that separates trustworthy synthesis from plausible fiction, so this slide sets the standard precisely. A theme is a one-sentence claim plus its supporting machinery: the codes it draws on, the participant IDs behind it, two to four verbatim quotes with line or timestamp references, and a coverage count. The evidence table sits alongside the themes and lists every quote with its source, so a reviewer can go from any claim to the exact words of the exact person in under a minute.

The verbatim rule deserves its own emphasis because it targets the specific way language models fail here. An agent under pressure to be helpful will smooth several similar comments into one tidy quote that nobody actually said, or will paraphrase a sentiment and present it inside quotation marks. The rule is mechanical: a quote must appear word for word in the named source, and if it cannot be found verbatim it is not reported. The instruction goes into the coding prompt, the challenge agent re-checks it, and the researcher checks it again by hand.

The spot-check is a fixed step, not an optional one: pick five quotes at random from the evidence table and confirm each exists verbatim in the named transcript or ticket. The sampling is small because the consequence is large — a single fabricated quote means the entire evidence table is treated as suspect, and the coding stage reruns with a stricter prompt. That sounds harsh and it is meant to; the credibility of every other quote rests on it.

Narration for this slide

Here is the standard that everything else hangs off. Every theme is a one-sentence claim with its evidence attached: the codes it draws on, the participant IDs that support it, and two to four quotes — verbatim, with a line reference. Verbatim means copied word for word from the source. If the agent cannot find the exact words, it does not report a quote, because the failure mode here is specific: an agent that wants to be helpful will smooth three similar comments into one tidy quote nobody ever said. So you check. Pick five quotes at random and find them in the sources. If even one is fabricated, the whole evidence table is suspect and the coding reruns. That sounds severe. It is the price of being able to trust the rest.

Slide 6 of 1316:9

Counting honestly: prevalence vs loudness

Sticky-note synthesis weights whoever was most quotable. Coverage counts weight what the data actually contains — and stop there.

  • Every theme carries a coverage count: "8 of 12 participants", "31% of coded tickets"
  • A vivid quote from one person is colour, not a finding
  • Counts describe the sample; they are not estimates of how common something is among all users
  • Tickets and reviews only represent people who bothered to write in — name that bias in the output
  • Pair qualitative counts with analytics before sizing or prioritising a fix

"Eight of twelve" is an honest description of the data. "Most users" is a claim the data cannot make.

Slide notes

Loudness bias is the quiet failure of manual synthesis: the most articulate participant, the angriest ticket, and the interview that happened most recently all punch above their weight, because human memory rewards vividness. Coverage counts are the antidote, and agents make them cheap. Every theme states how many participants or what share of coded tickets support it, which turns the team conversation from competing anecdotes into a discussion of which patterns are actually load-bearing.

The second half of the slide is the discipline that stops counts becoming a different kind of dishonesty. Eight of twelve participants is a description of the sample, not an estimate of prevalence in the user base, and qualitative synthesis should never be laundered into percentage claims for a board deck without quantitative backing. The same caution applies to corpora that select themselves: support tickets and app reviews only represent people motivated enough to write, so a number like 31 percent of coded onboarding tickets is true of the tickets and unknowable about new users in general. The honest output names that bias rather than hoping nobody asks.

The practical pattern from teams running this well is pairing: the synthesis produces a counted, evidenced hypothesis, and product analytics or a follow-up survey establishes the prevalence before anything is sized or prioritised. That pairing is also the bridge to Module 4, where the quantitative side gets its own treatment.

Narration for this slide

Manual synthesis has a loudness problem. The most quotable participant and the angriest ticket end up carrying the conclusion, because vivid things are what we remember in the synthesis room. Counts fix that — and agents make counting cheap. Every theme says how many participants support it, or what share of the coded tickets it covers. But counts come with their own discipline. Eight of twelve is a description of your sample, not a statement about your user base. Tickets only represent people who bothered to write in. So the honest move is to treat the counted theme as a strong, evidenced hypothesis — and pair it with analytics before you size the fix. We will spend all of Module 4 on that quantitative side.

Slide 7 of 1316:9

Hunting contradictions on purpose

Left alone, synthesis converges on a tidy story. The challenge pass exists to make the untidy parts impossible to ignore.

  • A separate challenge agent reviews the themes against all coded excerpts — not the agent that wrote them
  • It flags thin evidence: themes resting on fewer than three sources
  • It hunts disconfirming evidence and negative cases the synthesis left out
  • It names over-generalisation: "users want" language that outruns the sample
  • Demoted themes are kept as open questions for the next study, not deleted

The challenge file is read before the findings file. If you read the findings first, the story has already set.

Slide notes

Agents share a bias with tired researchers: they prefer tidy narratives. Codes that are merely similar get over-merged, the theme that matches the team's existing strategy gets stated a little more confidently than the evidence supports, and the two participants who flatly contradict the pattern quietly fall out of the summary. The challenge pass is the structural answer — a separate agent, prompted to act as a sceptical second researcher, reviewing every theme against the full set of coded excerpts.

Its job is specific, not vaguely critical: flag themes supported by fewer than three sources, flag quotes that do not appear verbatim in the named transcript, flag claims whose language generalises beyond the sample, and surface disconfirming evidence the synthesis did not mention. In the traced runs behind the workflows this course draws on, the challenge pass routinely demotes around a third of the proposed themes — and the demotions are often the most useful output, because they are the claims the team was most ready to believe. One frequently cited example: a synthesis initially claimed customers wanted a specific CRM integration; the challenge pass showed only two participants mentioned it and one was speculating about a colleague's needs. That became a question for the next study rather than a roadmap item.

Two habits make the pass effective. First, the challenge agent is genuinely separate — a different prompt and a fresh context, not the synthesis agent marking its own homework. Second, the researcher reads the challenge file before the findings file. Reading order matters more than it should: once the tidy story is in your head, the caveats read as footnotes.

Narration for this slide

Synthesis wants to tell a tidy story — and so do agents. Similar codes get merged, the theme that matches what the team already believes gets stated a touch too confidently, and the two participants who contradict the pattern just disappear from the summary. So we hunt contradictions on purpose. A separate challenge agent — not the one that wrote the themes — reviews everything against the coded excerpts. It flags themes resting on fewer than three sources, quotes that are not actually verbatim, claims that say users want when only this sample said anything, and the disconfirming evidence the synthesis skipped. In practice the challenge pass demotes a meaningful share of themes, and the demotions are often the most valuable part. One habit makes it work: read the challenge file before the findings. Once the story sets, caveats become footnotes.

Slide 8 of 1316:9

Good vs bad synthesis output

The single test is traceability: can the claim answer "which participants, which quotes, which lines"?

Bad: sounds rightGood: can be checked
Users are frustrated with onboarding and want a simpler experience8 of 12 participants (P01, P03–P07, P10, P12) described not configuring reporting during onboarding; 4 verbatim quotes with line references
One customer said: "the onboarding was a complete nightmare" (no ID, in no transcript)P04, line 211: "I honestly didn't know which workspace I was supposed to be in"
Customers churn primarily because of pricingPrice named in 7 of 12 interviews, but in 5 it followed a described value gap; both codes listed per participant
Users want a Salesforce integration2 of 12 mentioned a CRM integration, one speculatively; flagged by the challenge pass as thin evidence

The bad column reads better in a slide deck. The good column survives the first hard question from a stakeholder.

Slide notes

These rows are drawn from the user-research-synthesis workflow's quality comparison and from its churn-study case, so they are grounded in traced runs rather than invented for contrast. Walk one or two rows rather than all four. The first row is the most common failure: a claim that is probably true, reads well, and cannot be checked, defended, or acted on with any precision. Its good counterpart names the participants, counts the coverage, and points at quotes — which means a designer can read the original wording, and a sceptical stakeholder can audit the claim instead of arguing with it.

The pricing row is worth dwelling on because it shows synthesis changing a conclusion rather than just decorating one. Sales believed churn was about price; the coding showed price was named in seven interviews but in five of them only after a value gap was described, and the side-by-side codes with line references made that sequence visible. That is the kind of finding that only survives because the evidence trail exists — stated without it, it is just the researcher's opinion against the sales team's.

The quote row is the integrity case: a vivid quote with no participant ID that appears in no transcript is not a strong finding badly cited, it is fiction. The honest version is less dramatic and more useful. Close on the highlight: the bad column genuinely does present better, which is exactly why the discipline has to be structural rather than relying on good intentions in the moment.

Narration for this slide

Let's make the standard concrete with a comparison. On the left, claims that sound right: users are frustrated with onboarding, customers churn because of pricing, users want a Salesforce integration. On the right, the same territory stated so it can be checked: eight of twelve participants, named by ID, with quotes and line references. Price named in seven interviews — but in five of them only after a value gap was described, which changes the conclusion entirely. The CRM integration? Two mentions, one of them speculative, flagged as thin. Notice something uncomfortable: the left column reads better in a deck. The right column is what survives the first hard question from a stakeholder. That is the trade, and it is worth it every time.

Slide 9 of 1316:9

The interpretation pass: what stays human

The agents hand you sorted, counted, challenged material. What it means — and what to do about it — is not in the output, and should not be.

  • Decide which themes matter for the decision at hand, not which are most interesting
  • Set a confidence level per theme: strong, suggestive, or thin — and say why
  • Park thin themes as open questions for the next study; record that they were parked
  • Own the ethics of representation: how findings are framed to stakeholders is signed by the researcher
  • Familiarity is the input: spot-reading the sources is what makes the interpretation more than a summary of a summary

The agent can show you that a pattern exists. Whether it justifies changing what gets built is a judgment you sign your name to.

Slide notes

This is the second of the two jobs from the title slide, and it does not compress just because the first job got faster. The interpretation pass starts from the challenged, counted themes and asks the questions no agent can answer: which of these matter for the decision the team actually faces, what does the pattern mean given everything the researcher knows about the product, the market, and the history that never appears in the transcripts, and what should change as a result. A theme can be well-evidenced and still not matter; a thin theme can be the early signal of something important. Telling those apart is the job.

Confidence levels are the practical tool for keeping that judgment honest. Each theme gets an explicit rating — strong, suggestive, or thin — with a one-line justification that references the coverage count, the challenge flags, and the researcher's own read of the sources. Thin themes are parked as open questions rather than deleted, and the parking is recorded, because the next study's questions should come from this study's loose ends.

There is also an ethics-of-representation point that belongs to the researcher alone: how findings are framed to stakeholders, which caveats travel with the headline, and whether a count is presented in a way that invites over-reading. The agent produced the material; the researcher signs the interpretation. That signature is not a formality — it is the accountability that Module 1 of the Fundamentals course says never transfers, applied to research.

Narration for this slide

Now the second job — the one that stays yours. The agents hand you themes that are coded, counted, and challenged. They cannot tell you which themes matter for the decision in front of the team, what the pattern means given everything you know that never appears in a transcript, or what should change because of it. So the interpretation pass is explicit work, not a formality. Give every theme a confidence level — strong, suggestive, or thin — and say why. Park the thin ones as open questions for the next study, and write down that you parked them. And own how the findings are represented to stakeholders, including which caveats travel with the headline. The agent shows you that a pattern exists. You decide what it means. You sign it.

Slide 10 of 1316:9

Reporting formats: packets, not decks of adjectives

The output of synthesis is the research packet from Module 1, with the synthesis files inside it. A deck can be generated from the packet; the packet cannot be reconstructed from a deck.

  • themes.md — each theme with its claim, codes, participants, quotes, and confidence level
  • evidence-table.md — every quote with participant ID and line or timestamp reference
  • challenges.md — the sceptical pass: thin evidence, contradictions, demoted themes
  • opportunities.md — candidate opportunities, each linked to at least one supported theme
  • Open questions and parked themes — the seed of the next study
  • The codebook and run prompt archived alongside, so the study can be re-coded or compared later

If a finding only exists as a bullet on slide 14, it has already started to decay.

Slide notes

The reporting format question sounds cosmetic and is not. Most research dies in the deck: a finding becomes a bullet, the bullet becomes an adjective, and six weeks later nobody can say which participants it came from or whether it was strong or thin. The packet structure from Module 1 is the alternative — a folder of plain files where the themes, the evidence table, the challenge file, the opportunity map, and the open questions live together, with the codebook and the run prompt archived alongside them.

Walk the files briefly. The themes file carries the claims with their full machinery, including the confidence levels from the interpretation pass. The evidence table is the audit surface — anyone can go from claim to quote to source. The challenges file stays in the packet rather than being tidied away, because the demoted themes and named caveats are part of the finding, not an embarrassment to hide. The opportunities file has one rule: nothing appears in it that is not linked to at least one supported theme, which is what stops the roadmap wishlist sneaking in dressed as research.

The deck still gets made — stakeholders need a readout — but it is generated from the packet and points back into it, rather than being the only place the findings exist. Keeping the packet in the project repository also pays forward: the next study can re-code against an updated codebook and compare across time, and Module 6 builds on exactly this structure when findings get converted into design briefs.

Narration for this slide

Where do the findings live? Not primarily in a deck. Decks are where research goes to decay — a finding becomes a bullet, the bullet becomes an adjective, and six weeks later nobody can trace it. The output of synthesis is a packet: the themes file with claims and confidence levels, the evidence table, the challenge file with its demoted themes left visible, an opportunity map where every entry links to a supported theme, and the open questions that seed the next study. The codebook and the prompt get archived with it, so the study can be re-run or compared later. You still make the deck for the readout — but the deck is generated from the packet and points back into it. The packet is the deliverable.

Slide 11 of 1316:9

Worked example: forty support conversations

A traced run on a corpus of forty onboarding-tagged support conversations, batched ten per coding agent.

StageWhat happened
PrepareConversations anonymised to ticket IDs; researcher read ten of them and drafted a nine-code codebook around onboarding moments
CodeFour agents, ten conversations each, against the shared codebook; 163 coded excerpts with verbatim quotes and ticket references
ClusterSix themes proposed; the largest: confusion between inviting a teammate and transferring ownership, in 14 of 40 conversations
ChallengeTwo themes demoted — one rested on three conversations, one generalised "users" from people who wrote in
InterpretResearcher spot-checked five quotes, set confidence levels, paired the invite-confusion theme with analytics before the team sized a fix

Total researcher time was about ninety minutes — and the expensive minutes were the reading and the interpretation, which is exactly where they should be.

Slide notes

This run follows the same shape as the ticket-mining case in the user-research-synthesis workflow, scaled to a corpus a single team can hold in its head: forty support conversations tagged onboarding, exported as text. Conversations are shorter than interviews, so they are batched — here ten per coding agent — which keeps the run fast without giving any single agent so much material that the later conversations get skimmed.

Walk the table by stage and call out where the human time went. Preparation was the researcher's reading and the codebook draft — about forty minutes. The coding and clustering were agent time, returning 163 coded excerpts and six proposed themes with coverage counts and quotes attached. The challenge pass demoted two themes, and the demotion reasons are the teaching point: one rested on three conversations, and one used users-in-general language for a pattern that, by construction, only describes people frustrated enough to contact support. The interpretation pass spot-checked quotes, set confidence levels, and — because ticket counts cannot establish prevalence — paired the leading theme with product analytics before the team committed to a fix.

Be explicit that the numbers are from a traced run of this pattern, not a benchmark, and that forty conversations is deliberately small. The same structure scales to hundreds of tickets or fifteen hour-long interviews; what changes is batch size and run time, not the shape, and not the location of the human gates.

Narration for this slide

Let's trace one run end to end. The corpus: forty support conversations tagged onboarding. The researcher read ten of them, drafted a nine-code codebook, and anonymised the export. Four coding agents took ten conversations each and returned a hundred and sixty-three coded excerpts, every quote verbatim with a ticket reference. Clustering proposed six themes — the biggest was confusion between inviting a teammate and transferring ownership, in fourteen of the forty conversations. The challenge pass demoted two themes: one was thin, and one quietly generalised from people who wrote in to users in general. The researcher spot-checked quotes, set confidence levels, and paired the lead theme with analytics before anyone sized a fix. Total researcher time, about ninety minutes — spent on reading and interpretation, which is exactly where it belongs.

Slide 12 of 1316:9

Exercise: theme a small corpus and audit the trail

Use a corpus you already have — recent support tickets, app reviews, or a handful of old transcripts. Keep it small enough to check by hand.

  • Pick 15–30 short items or 3–5 transcripts; anonymise them and confirm consent covers automated analysis
  • Read a third of the corpus yourself, then write a codebook of 8–12 codes with one-line definitions
  • Run the synthesis: coding with verbatim quotes and source references, themes with coverage counts, then a challenge pass
  • Audit the evidence trail: pick five quotes at random and find them in the sources; note any that fail
  • Write the interpretation by hand: confidence level per theme, what each means, and one parked open question

The audit step is the exercise. If all five quotes check out, you have a workflow you can trust; if one does not, you have learned exactly why the rules exist.

Slide notes

The exercise is sized so the human audit is genuinely possible: a corpus small enough that fabrication has nowhere to hide. Steer participants towards material they already have rights to and familiarity with — their own product's support tickets, app-store reviews, or transcripts from a past study where consent was broad enough to cover this use. The anonymisation and consent step is part of the exercise, not preamble; for some participants the honest outcome is discovering that their existing consent language does not cover automated analysis, which is itself the lesson.

The codebook step is where most of the design thinking happens, and where most participants will feel the pull to skip the reading. Encourage them to notice that pull, because it is the same pull that produces codebook-shaped themes in real studies. The run itself can use whatever agent setup they have available — the structure matters more than the tooling, and the workflow prompts in the school's user-research-synthesis workflow are a working starting point.

The audit and the hand-written interpretation are the two steps to insist on. The quote audit either builds justified trust or surfaces a fabrication early, in a low-stakes setting; both outcomes are valuable. The interpretation forces the participant to do the second job rather than presenting the agent's clustering as the finding. If running this live, compare confidence levels across the room — two people theming the same corpus with the same codebook will still rate themes differently, which is a useful demonstration that interpretation was never mechanical.

Narration for this slide

Time to run this on your own material. Pick a small corpus you already have — fifteen to thirty support tickets or reviews, or three to five old transcripts — and start by anonymising it and checking that consent actually covers automated analysis. Read a third of it yourself, then write a codebook of eight to twelve codes. Run the synthesis: coding with verbatim quotes, themes with coverage counts, a challenge pass. Then the part that matters most: audit the trail. Pick five quotes at random and go find them in the sources. Finally, write the interpretation by hand — a confidence level for each theme, what it means, and one open question you are parking. If all five quotes check out, you have earned some trust in the pipeline. If one fails, you have learned, cheaply, exactly why the rules exist.

Slide 13 of 1316:9

Summary, and what comes next

  • Synthesis is two jobs: agents sort and count; the researcher decides what it means
  • Privacy and consent are settled before any agent sees the corpus — and can rule the workflow out
  • The codebook is the contract; the evidence trail from theme to verbatim quote is the trust
  • Coverage counts beat loudness, but they describe the sample — they are not prevalence
  • Contradictions are hunted by a separate challenge pass, and the challenge file is read first
  • The output is a packet with confidence levels and open questions, not a deck of adjectives

Module 4 moves to the quantitative side: surveys, experiments, and funnel diagnosis — where agents speed up the work but cannot supply the rigour.

Slide notes

Recap by walking the funnel rather than the bullets: the researcher prepares and interprets, agents code, cluster, and challenge in between, and the evidence trail runs the full length. The two-jobs framing from the title slide is the thing to leave in people's heads — the coding got cheap, the meaning did not, and a team that confuses the two will produce fast, confident, unaccountable findings.

The bridge to Module 4 is already planted in this module twice: the prevalence-versus-loudness slide ended with pairing qualitative themes with analytics, and the worked example did exactly that before the team sized a fix. Module 4 takes up that quantitative side properly — survey instruments that do not lead, experiment readouts that do not invent significance, and funnel diagnosis that produces ranked, testable hypotheses — with the same underlying posture: agents accelerate the work, and the rigour still has to come from somewhere human.

If participants did the exercise, suggest they keep the packet folder. Module 6 returns to it when findings get converted into design briefs, and the open questions they parked here are candidates for the studies they will scope there.

Narration for this slide

Let's close. Synthesis is two jobs, and this module split them deliberately. Agents do the sorting: coding against your codebook, clustering into themes, counting coverage, and challenging their own output for thin evidence and contradictions. You do the meaning: consent before the run, the codebook, the spot-check, the confidence levels, and the call about what changes because of it. The output is a packet with the evidence trail intact — not a deck of adjectives. And the counts, remember, describe your sample, not your user base. That is exactly where Module 4 picks up: surveys, experiments, and funnel diagnosis — the quantitative work where agents speed things up but the rigour still has to come from you. See you there.

Module transcript
Module 3, narrated slide by slide

Slide 1Research Synthesis with Agents

Welcome to Module 3. This one is about synthesis — the step where research most often goes to die. Twelve interviews, nine hours of recordings, a synthesis session that gets postponed twice and then squeezed into an afternoon of sticky notes. Here is the framing for the whole module: synthesis is two jobs. Sorting — coding every transcript and ticket and keeping the evidence attached. And meaning — deciding what the patterns imply and what to do about them. Agents are very good at the first job and have no business doing the second. By the end of this module you will know how to delegate the coding without delegating the conclusion.

Slide 2Preparing the corpus: privacy comes first

Before any technique, the part the researcher owns outright: privacy. Nothing from a study reaches an agent until it is anonymised. Names become participant IDs. Emails, employers, and incidental personal details are stripped. The file that maps IDs back to real people stays outside the agent's folder, full stop. Then consent. If participants agreed to their recordings being analysed by your team, you decide — explicitly, on the record — whether automated analysis sits inside that consent, and whether your tooling keeps the data out of model training. If the answer is no, this workflow is simply not available for that study. That is not a limitation to work around. That is the system working.

Slide 3The codebook is the contract

The codebook is where your thinking enters the run. It is a short document: each code gets a name, a one-sentence definition, and an example quote. The codes come from your research questions, and you leave explicit room for new codes the agent proposes when nothing fits. Keep it to somewhere between eight and fifteen — thirty codes means everything matches something and you learn nothing. And here is the step you cannot delegate: read at least three transcripts yourself before you write it. The codebook captures what you noticed in the data. If you have not read the data, the agent will quietly supply its own framing — and its framing is the average of everyone else's research.

Slide 4The synthesis funnel

Here is the whole module in one picture. Five stages. You prepare: anonymise, settle consent, read a sample, write the codebook. Agents code: one per transcript or batch of tickets, same codebook, verbatim quotes only, every quote carrying a participant ID and a line reference. An agent clusters the codes into themes — each theme gets a claim, its supporting participants, its quotes, and an honest count. A separate challenge agent then plays sceptical second researcher: thin evidence, invented quotes, contradictions the synthesis ignored. And then it comes back to you for interpretation. Notice the green band underneath. The evidence trail runs the full length of the funnel. That trail is the difference between synthesis and a fast, confident opinion.

Slide 5Theming with evidence: every claim points at quotes

Here is the standard that everything else hangs off. Every theme is a one-sentence claim with its evidence attached: the codes it draws on, the participant IDs that support it, and two to four quotes — verbatim, with a line reference. Verbatim means copied word for word from the source. If the agent cannot find the exact words, it does not report a quote, because the failure mode here is specific: an agent that wants to be helpful will smooth three similar comments into one tidy quote nobody ever said. So you check. Pick five quotes at random and find them in the sources. If even one is fabricated, the whole evidence table is suspect and the coding reruns. That sounds severe. It is the price of being able to trust the rest.

Slide 6Counting honestly: prevalence vs loudness

Manual synthesis has a loudness problem. The most quotable participant and the angriest ticket end up carrying the conclusion, because vivid things are what we remember in the synthesis room. Counts fix that — and agents make counting cheap. Every theme says how many participants support it, or what share of the coded tickets it covers. But counts come with their own discipline. Eight of twelve is a description of your sample, not a statement about your user base. Tickets only represent people who bothered to write in. So the honest move is to treat the counted theme as a strong, evidenced hypothesis — and pair it with analytics before you size the fix. We will spend all of Module 4 on that quantitative side.

Slide 7Hunting contradictions on purpose

Synthesis wants to tell a tidy story — and so do agents. Similar codes get merged, the theme that matches what the team already believes gets stated a touch too confidently, and the two participants who contradict the pattern just disappear from the summary. So we hunt contradictions on purpose. A separate challenge agent — not the one that wrote the themes — reviews everything against the coded excerpts. It flags themes resting on fewer than three sources, quotes that are not actually verbatim, claims that say users want when only this sample said anything, and the disconfirming evidence the synthesis skipped. In practice the challenge pass demotes a meaningful share of themes, and the demotions are often the most valuable part. One habit makes it work: read the challenge file before the findings. Once the story sets, caveats become footnotes.

Slide 8Good vs bad synthesis output

Let's make the standard concrete with a comparison. On the left, claims that sound right: users are frustrated with onboarding, customers churn because of pricing, users want a Salesforce integration. On the right, the same territory stated so it can be checked: eight of twelve participants, named by ID, with quotes and line references. Price named in seven interviews — but in five of them only after a value gap was described, which changes the conclusion entirely. The CRM integration? Two mentions, one of them speculative, flagged as thin. Notice something uncomfortable: the left column reads better in a deck. The right column is what survives the first hard question from a stakeholder. That is the trade, and it is worth it every time.

Slide 9The interpretation pass: what stays human

Now the second job — the one that stays yours. The agents hand you themes that are coded, counted, and challenged. They cannot tell you which themes matter for the decision in front of the team, what the pattern means given everything you know that never appears in a transcript, or what should change because of it. So the interpretation pass is explicit work, not a formality. Give every theme a confidence level — strong, suggestive, or thin — and say why. Park the thin ones as open questions for the next study, and write down that you parked them. And own how the findings are represented to stakeholders, including which caveats travel with the headline. The agent shows you that a pattern exists. You decide what it means. You sign it.

Slide 10Reporting formats: packets, not decks of adjectives

Where do the findings live? Not primarily in a deck. Decks are where research goes to decay — a finding becomes a bullet, the bullet becomes an adjective, and six weeks later nobody can trace it. The output of synthesis is a packet: the themes file with claims and confidence levels, the evidence table, the challenge file with its demoted themes left visible, an opportunity map where every entry links to a supported theme, and the open questions that seed the next study. The codebook and the prompt get archived with it, so the study can be re-run or compared later. You still make the deck for the readout — but the deck is generated from the packet and points back into it. The packet is the deliverable.

Slide 11Worked example: forty support conversations

Let's trace one run end to end. The corpus: forty support conversations tagged onboarding. The researcher read ten of them, drafted a nine-code codebook, and anonymised the export. Four coding agents took ten conversations each and returned a hundred and sixty-three coded excerpts, every quote verbatim with a ticket reference. Clustering proposed six themes — the biggest was confusion between inviting a teammate and transferring ownership, in fourteen of the forty conversations. The challenge pass demoted two themes: one was thin, and one quietly generalised from people who wrote in to users in general. The researcher spot-checked quotes, set confidence levels, and paired the lead theme with analytics before anyone sized a fix. Total researcher time, about ninety minutes — spent on reading and interpretation, which is exactly where it belongs.

Slide 12Exercise: theme a small corpus and audit the trail

Time to run this on your own material. Pick a small corpus you already have — fifteen to thirty support tickets or reviews, or three to five old transcripts — and start by anonymising it and checking that consent actually covers automated analysis. Read a third of it yourself, then write a codebook of eight to twelve codes. Run the synthesis: coding with verbatim quotes, themes with coverage counts, a challenge pass. Then the part that matters most: audit the trail. Pick five quotes at random and go find them in the sources. Finally, write the interpretation by hand — a confidence level for each theme, what it means, and one open question you are parking. If all five quotes check out, you have earned some trust in the pipeline. If one fails, you have learned, cheaply, exactly why the rules exist.

Slide 13Summary, and what comes next

Let's close. Synthesis is two jobs, and this module split them deliberately. Agents do the sorting: coding against your codebook, clustering into themes, counting coverage, and challenging their own output for thin evidence and contradictions. You do the meaning: consent before the run, the codebook, the spot-check, the confidence levels, and the call about what changes because of it. The output is a packet with the evidence trail intact — not a deck of adjectives. And the counts, remember, describe your sample, not your user base. That is exactly where Module 4 picks up: surveys, experiments, and funnel diagnosis — the quantitative work where agents speed things up but the rigour still has to come from you. See you there.