AAgentic Design School
Module 2 of 6
45–55 minutes

Agentic Design Research

Deep Design Research and Competitive Teardowns

Desk research and competitive analysis run at agent scale: structured teardowns of competitor flows, pattern surveys across a category, and the discipline that separates what was observed from what is being inferred.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Scope a deep research run so breadth does not become unfocused accumulation.
  • Run competitive teardowns that capture flows, patterns, and trade-offs with screenshots.
  • Keep observation, inference, and recommendation visibly separate in the output.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Deep Design Research and Competitive Teardowns

Agentic Design Research · Module 2 of 6

  • Coverage is cheap; judgment about it is not
  • Scoping the question around a decision
  • Teardowns: flows, pricing, onboarding, empty states — with evidence
  • Keeping observation, inference, and recommendation separate

Agents can read more of a category in an afternoon than a researcher reads in a month. This module is about keeping that breadth honest.

Slide notes

Module 1 established the research packet as the unit of agentic research: the question, the sources, the method, the findings, and the evidence trail in one auditable structure. This module fills two of the most common packets a design team builds — deep desk research and competitive teardowns — and the failure mode they share: breadth without judgment. An agent will happily fetch forty sources or walk eight competitor products. Whether that produces insight or a museum of screenshots depends on decisions the human makes before and after the run.

Be explicit about the two workflows that anchor the module, because they are different jobs that often get conflated. Deep research reads what has been written about a space — articles, documentation, reviews, standards — and synthesises it with citations. A teardown walks the products themselves: signs up, clicks through onboarding, screenshots the pricing page, counts the steps. Teams get into trouble when they substitute one for the other.

The through-line for the whole module is the title's claim: coverage is cheap, judgment about it is not. The agent supplies coverage. The researcher supplies the question, the rubric, the reading of the evidence, and the call about what it means. Every slide that follows is a technique for keeping those two contributions from blurring into each other.

Narration for this slide

Welcome to Module 2. In the last module we built the research packet — the question, the method, the findings, and the evidence trail, all in one auditable structure. Now we fill that structure with the two kinds of research agents do at a scale a single researcher cannot: deep desk research and competitive teardowns. The headline is on the slide. Coverage is cheap now — an agent can read forty sources or walk eight competitor products in an afternoon. Judgment about that coverage is not cheap, and it does not transfer to the agent. This module is about scoping the work, capturing real evidence, and keeping what you observed visibly separate from what you are inferring.

Slide 2 of 1316:9

Scope the question: what would change a decision

The single biggest quality lever is the question. A vague question produces a survey of everything; a sharp one produces an answer the team can act on.

  • Name the decision the research informs — a redesign, a pricing page, an accessibility commitment
  • Name the products, the moment in the experience, and the user
  • Say what counts as evidence, and what does not (marketing copy alone does not)
  • Say what would change your mind — if nothing could, you are seeking confirmation
  • Choose five to eight angles yourself; do not let the agent guess them

A research question without a decision behind it produces a museum, not a packet.

Slide notes

This slide carries the scoping objective for the module. The deep research workflow's quality is set before the run starts, in the question. A good research question for agentic work has four parts: the question itself with named products and a named moment in the experience, the decision it informs, what counts as evidence, and what would change the team's mind. That last part is the honesty test — if no possible finding could change the decision, the run is confirmation-seeking, and the agent will happily oblige.

The angles are the second scoping decision. Deep research runs fan out across angles — typically one per named competitor or platform, one for the primary standard or guideline fetched from its source, one for published criticism and counter-examples, and one for recency. Five to eight angles is the practical range: fewer than four and the cross-checking has nothing to push against; more than ten and the review time goes to duplication. The point of naming them yourself is that the angles are where research breadth comes from, and breadth chosen by the agent reflects what is easy to find, not what the decision needs.

Connect this back to the packet from Module 1: the question, the decision, and the evidence standard are the first fields of the packet, written before any source is fetched. If they are vague, everything downstream inherits the vagueness — and no amount of cross-checking repairs a question that did not need answering.

Narration for this slide

Everything starts with the question, and the test for a good one is simple: what decision would change because of the answer? Name that decision. Name the products, the moment in the experience, and the user. Then set the evidence standard up front — for example, marketing copy alone does not count as evidence of what a product does. And write down what would change your mind, because if nothing could, you are not researching, you are collecting support for a decision already made. Finally, choose the research angles yourself — usually one per competitor, one for the standard, one for criticism, one for recency. Five to eight angles. The agent supplies the breadth. You supply the aim.

Slide 3 of 1316:9

Two kinds of competitive research

Deep research reads what has been written. A teardown walks the products. Teams need both, and get into trouble substituting one for the other.

Deep desk researchCompetitive teardown
What it readsArticles, docs, reviews, standards, changelogsThe products themselves: signup, flows, pricing pages
EvidenceCited claims, cross-checked across sourcesScreenshots, step counts, verbatim claims with URLs
Strong atStandards, conventions, what the category saysWhat the product actually does on first run
Weak atAnything not yet written aboutWhy a competitor built it, and whether it works for them
Typical run20–45 minutes per question2–4 hours per teardown

A literature search tells you a competitor is loved for its onboarding. Only a teardown tells you it is a three-step flow that defers account creation.

Slide notes

This comparison sets up the rest of the module, so make the distinction concrete. Deep desk research is a literature search at agent scale: the question fans out into angles, each angle searches and fetches sources, claims are cross-checked against each other, and what survives becomes a cited report. It is the right tool for standards questions, pattern research, and anything where the answer already exists in public writing. A teardown is the other kind: it walks the live products, captures ordered onboarding screenshots, full pricing pages, verbatim positioning claims, and core flows, and treats marketing copy as a claim to verify rather than a fact to record.

The trouble comes from substitution. A team that runs only the literature search inherits the category's self-description — what gets written about products is shaped by marketing, launch coverage, and whoever bothered to publish a teardown. A team that runs only walkthroughs misses the standards, the published evidence, and the criticism that would have warned them off a pattern. The two pair well in practice: the desk research frames what to look for, the teardown checks what is actually there.

The time figures are honest as of June 2026 and worth keeping on the slide: a deep research question is twenty to forty-five minutes of mostly unattended run time; a teardown is two to four hours including the human review, because creating trial accounts and verifying evidence does not compress.

Narration for this slide

There are two ways to research competitors, and they produce different artifacts. Deep desk research reads what has been written — articles, documentation, reviews, standards — and synthesises it into a cited report. A teardown walks the products themselves: it signs up, clicks through onboarding, screenshots the pricing page, and counts the steps. Here is why the difference matters. A literature search will tell you a competitor is loved for its onboarding. Only a teardown tells you it is a three-step flow that defers account creation until after the first project exists. You need both, and the most common mistake is substituting one for the other — usually letting published opinion stand in for what the product actually does.

Slide 4 of 1316:9

Decide what evidence counts before collecting any

A teardown drifts into screenshot hoarding unless the dimensions are fixed up front. Every cell in the eventual matrix should trace to a file.

  • Onboarding: ordered screenshots from signup to first meaningful action, step counts, what can be deferred
  • Pricing: the full page, tiers, billing toggle, what is gated where, trial terms
  • Positioning: headline and the three hardest-leaned claims, verbatim, with URLs
  • Core flows and empty states: first project, inviting a teammate, the main view before any data exists
  • For every claim: what the walkthrough showed — supports, contradicts, or could not verify

Anything that could not be collected is recorded with the reason, never guessed. "Not collected" is a finding.

Slide notes

The dimensions file is the rubric the whole teardown is normalised against, and writing it is design work, not admin. The dimensions come from the design decision: an onboarding redesign needs ordered screenshots and step counts; pricing work needs the page, the tiers, and what each tier gates; a positioning refresh needs the verbatim claims and whether the product backs them. The discipline is that every competitor folder must contain evidence for every dimension, or an explicit note explaining why it could not be collected — paywalled, sales-gated, region-locked. That note is itself a finding about the competitor's go-to-market, not a gap to fill by guessing.

Empty states deserve a specific mention because they are where category research pays off most directly and where memory is least reliable. What a product shows before any data exists — a guided checklist, sample data, a template picker, an illustration with one call to action — is exactly the kind of detail that nobody remembers accurately and that a dated screenshot settles in seconds.

The last bullet is the cross-check rule that does the most work downstream. Positioning claims get recorded verbatim with their URLs and labelled as claims; the walkthrough evidence then marks each one supported, contradicted, or unverified. That single habit is what stops marketing copy laundering itself into the comparison matrix as observed fact.

Narration for this slide

Before anyone collects anything, fix what evidence counts. This is the dimensions file, and it is the rubric every competitor gets measured against. For onboarding: ordered screenshots from signup to the first meaningful action, with step counts. For pricing: the full page, the tiers, what is gated where. For positioning: the headline and the three claims the marketing leans on hardest, verbatim, with URLs. For core flows and empty states: what the product shows before any data exists. And for every marketing claim, the cross-check — did the walkthrough support it, contradict it, or fail to verify it? One more rule: anything that could not be collected gets recorded with the reason. Never guessed. Not collected is a finding.

Slide 5 of 1316:9

The teardown pipeline

Five stages. The designer holds the first and last; agents run capture, normalisation, and synthesis in between.

Pipeline diagram of a competitive teardown in five stages. First, the designer defines the competitor set and the evidence dimensions, tied to a design decision. Second, one agent per competitor captures evidence: onboarding screenshots, pricing pages, verbatim positioning claims, and core flows, into a folder with a dated manifest. Third, findings are normalised against the fixed rubric, with marketing claims cross-checked against the walkthrough and labelled supported, contradicted, or unverified. Fourth, the evidence is synthesised into a comparison matrix, named patterns, and opportunity gaps. Fifth, the designer reviews: sampling cells against evidence, separating observation from inference, and deciding which gaps to pursue. A dashed line returns from review to the competitor set for the quarterly re-run.
Defining the set and the rubric, and the final review, are designer-led. Capture, normalisation against the rubric, and synthesis into the matrix and gaps are agent-run. The dashed return is the quarterly re-run, diffed against the archived evidence.

One agent per competitor is what makes the treatment systematic: the eighth competitor gets the same rubric as the first.

Slide notes

Walk the pipeline left to right and name the owner of each stage. Defining the set is human: the design decision, the five to eight competitors, the dimensions file, and the access and rights rules. Capture is agent-run, one collection agent per competitor, each filling its evidence folder against the same dimensions and writing a dated manifest of every file. Normalisation is where the rubric earns its keep: every finding cites an evidence file, marketing claims are checked against what the walkthrough showed, and cells the evidence does not cover stay empty rather than getting filled. Synthesis assembles the comparison matrix, the named patterns, and the opportunity gaps. The review is human again: sample the cells against their evidence, separate observation from inference, and decide which gaps are opportunities and which exist for good reasons.

The one-agent-per-competitor structure matters more than it looks. Run by hand, teardowns produce detailed notes on the first three competitors and a shrug for the rest, because attention decays. The fan-out gives the eighth competitor the same systematic treatment as the first, which is the entire point of doing this with agents.

Point at the dashed return line. Teardown evidence decays — pricing pages change without notice, onboarding gets rebuilt — so a teardown older than a quarter is a historical document. The archived evidence folders make the next pass a diff rather than a restart, which is also when the matrix becomes genuinely interesting: what changed tells you where the category is moving.

Narration for this slide

Here is the whole teardown as a pipeline. You define the set: the decision, five to eight competitors, and the evidence rubric. Then agents take over — one collection agent per competitor, each filling an evidence folder with screenshots, pricing pages, and verbatim claims, against the same rubric. The findings get normalised: every cell cites a file, marketing claims get checked against what the walkthrough actually showed. Then synthesis: the comparison matrix, the named patterns, the opportunity gaps. And then it comes back to you — sample the evidence, separate what was seen from what is being inferred, and decide which gaps are worth pursuing. The dashed line is the re-run: next quarter, the same pipeline becomes a diff against the archive.

Slide 6 of 1316:9

Screenshot evidence and the rights questions

Screenshots are the evidence that settles arguments. They also raise questions a research packet has to answer deliberately, not by accident.

  • Use trial accounts and public pages; respect terms of service; never misrepresent who you are
  • Keep competitor screenshots as internal evidence — reference them, do not republish them
  • Date every capture in the manifest; undated evidence cannot be trusted or diffed later
  • If access requires a sales call, record that as a go-to-market finding, not a gap to fill another way
  • External materials — pitches, articles, decks — use your own concept work, not competitor UI

The walkthrough is legitimate research. Republishing what it captured is a separate decision with separate rules.

Slide notes

Screenshots are what make a teardown a teardown: a dated capture of what the product showed settles in seconds the arguments that memory and marketing copy keep alive for weeks. But evidence collection on competitor products has rules, and the time to set them is in the dimensions file, before any agent or person starts capturing. The baseline as practised in the source workflow: trial accounts and public pages only, stay within terms of service, and never misrepresent who you are to get access. If a competitor gates the product behind a sales call, that is a finding about their go-to-market — record it as such rather than treating it as a hole to fill by other means.

The second rule is about use, not collection. Competitor screenshots are internal evidence: they sit in the evidence folders, they back the matrix cells, and team members open them to verify findings. Republishing them — in pitch decks, public articles, or marketing — is a different act with different legal and reputational exposure. The agency case in the source workflow is the model: the pitch's central argument was built on a gap the teardown found, illustrated with the agency's own concept work, with the matrix kept as an appendix demonstrating rigour.

Dating matters more than it seems. Pricing pages and onboarding flows change without notice; an undated screenshot is unverifiable within a quarter. The manifest — every file, what it shows, where it came from, when it was collected — is what keeps the evidence auditable and makes the next teardown a diff.

Narration for this slide

Screenshots are the evidence that settles arguments — but capturing competitor products comes with rules worth setting up front. Use trial accounts and public pages, stay inside the terms of service, and never misrepresent who you are to get access. Keep what you capture as internal evidence: reference it, link matrix cells to it, but do not republish competitor UI in decks or articles — use your own concept work for anything external. Date every capture, because pricing pages and onboarding flows change without notice, and undated evidence cannot be diffed later. And if a product is hidden behind a sales call, write that down as a finding about how they sell. It is not a gap you should fill some other way.

Slide 7 of 1316:9

Pattern surveys: table stakes vs differentiating

Once the evidence is in folders, the cross-competitor read is what turns a pile of teardowns into a map of the category.

  • Name the recurring patterns: setup checklists, sample data, template pickers, anchored pricing tiers
  • Count honestly: "4 of 5 products show a setup checklist on first run" beats "most products do this"
  • Table stakes: patterns nearly everyone ships — absence is felt, presence is not praised
  • Differentiators: patterns one or two products own, with evidence of how they execute them
  • Gaps: things no competitor does well — the matrix can find them, only judgment can rank them

A pattern shared by seven competitors might be best practice or a herd error. Structure is not performance.

Slide notes

The pattern survey is the synthesis layer of the teardown and the part most worth presenting to the team, because it answers the question stakeholders actually ask: where does the category set expectations, and where is there room to be different? Table stakes are the patterns that have become the category's grammar — in the project-tool example, guided setup checklists and template galleries; in the pricing example, three tiers with the middle anchored and an annual toggle. Users do not praise these patterns; they notice their absence. Differentiators are the patterns one or two products own, and they are only useful in the matrix when the evidence shows how the execution differs, not just that a feature exists.

Counting discipline is what separates a survey from an impression. The useful sentence has a numerator and a denominator with citations behind it: four of the five researched products show a setup checklist on first run, with the evidence files listed. The vague version — most products do this — cannot be checked, ages silently, and gets repeated long after it has stopped being true.

Keep the limit visible: prevalence is not proof of effectiveness. A pattern shared by the whole category may be a well-tested best practice or a herd error nobody has questioned; usage data, conversion, and retention are invisible from the outside. The matrix tells you what exists. Whether to follow the convention or break it is a design judgment, made with your own users in view — which is also where the gaps list comes in, since the most interesting output is often the thing nobody in the category does well.

Narration for this slide

With evidence in the folders, the cross-competitor read begins, and the useful frame is table stakes versus differentiators. Table stakes are the patterns nearly everyone ships — setup checklists, template galleries, three pricing tiers with the middle one anchored. Nobody praises them; their absence is what gets noticed. Differentiators are the patterns one or two products own, and the evidence needs to show how they execute, not just that the feature exists. Count honestly: four of five products show a checklist on first run, with citations — not, most products do this. And remember the limit. A pattern the whole category shares might be best practice, or it might be a herd error. Structure is not performance, and the matrix cannot tell you which is which.

Slide 8 of 1316:9

Observation, inference, recommendation

Three different kinds of statement, three different levels of confidence. The output stays honest only when they are visibly separate.

LayerWhat it isExampleWho owns it
ObservationWhat the evidence shows, with the file behind itCompetitor A: 4 steps from signup to a populated template (onboarding-01..04.png, dated)Agent collects; anyone can verify
InferenceA reading of what the observations meanShort paths to first value correlate with deferring team invites in this setShared — and labelled as inference
RecommendationWhat we should do about itOpen into a populated template; defer invites until after the first projectDesigner — carries the accountability

When the three layers blur, marketing claims become facts and hunches become findings. Keep the labels on.

Slide notes

This is the discipline named in the module's learning objectives, and it is worth treating as a formatting rule rather than an aspiration: the packet keeps observations, inferences, and recommendations in separate, labelled layers. An observation cites an evidence file and can be verified by anyone who opens it. An inference is a reading of what the observations mean — it can be reasonable, but it is a step away from the evidence and has to be labelled as that step. A recommendation is a design decision, and it belongs to the designer who will be accountable for it, informed by the evidence but not dictated by it.

The failure modes are all blurrings of these layers. Marketing claims laundered into observed facts — the cross-check exists to catch exactly that. Inferences written in the confident voice of observations — the report says users prefer when the evidence only supports four of five products show. And recommendations smuggled inside findings, so the team experiences the packet as a decision already made rather than evidence to decide with. Agents are particularly prone to the second failure: synthesis models reach for fluent, confident phrasing, so the layer labels need to be demanded in the output format, not hoped for.

A practical convention: in the matrix and the findings document, observations carry file citations, inferences carry an explicit 'inference' tag and name the observations they rest on, and recommendations live in a separate one-page document the designer writes by hand. That last part is deliberate — writing the recommendation page yourself is what keeps the judgment human.

Narration for this slide

Here is the discipline that keeps the whole module honest. Three kinds of statement, kept visibly separate. Observations are what the evidence shows — competitor A takes four steps from signup to a populated template, and here are the screenshots. Anyone can verify them. Inferences are readings of what the observations mean — short paths to first value seem to go with deferring team invites. Reasonable, but a step away from the evidence, so label it. Recommendations are what we should do about it — and they belong to you, the designer, because you carry the accountability. When these layers blur, marketing claims become facts and hunches become findings. Keep the labels on, and make the agent keep them on too.

Slide 9 of 1316:9

Source quality: every source type has a bias

Cross-checking filters repeated misinformation, but it works on the sources you gave it. Know what each kind of source is for.

  • Marketing pages: claims to verify, never evidence — record verbatim, then check against the walkthrough
  • Official docs and changelogs: strong primary sources for what a product does, weak on how well it works
  • Reviews and community threads: real friction signals, but biased towards the angry and the recent
  • Published teardowns and pattern libraries: useful, and often citing each other in a circle — check the dates
  • Standards (WCAG, platform guidelines): fetch the primary text and cite criteria by number, not by paraphrase

Cross-checking cannot detect a claim that is wrong everywhere it appears, and the public web underrepresents anything paywalled, internal, or not in English.

Slide notes

The cross-checking step in the deep research workflow — a claim must be supported by two independent sources or one primary source, or it gets dropped and listed separately — is the mechanical defence against repeated misinformation. The pricing-page case from the source workflow is the cautionary tale: of fourteen widely repeated claims about pricing pages, only six survived cross-checking, and the famous precise-percentage conversion claim traced back through a circle of articles to a single vendor case study with no published data. Reading the dropped-claims appendix before the findings is a habit worth building; it is often the most useful page in the report.

But cross-checking has limits worth stating plainly. It cannot detect a claim that is wrong everywhere it appears — a myth the whole industry repeats survives any vote among sources. It inherits the biases of what gets published in English on the public web: paywalled research, internal data, recently changed product surfaces, and non-English sources are systematically underrepresented. And it says nothing about your users; web research describes the category, not your product.

The per-source guidance on the slide is the practical layer. Marketing copy is a claim to verify. Docs and changelogs are good primary sources for what exists, silent on quality. Reviews surface real friction but skew angry and recent. Published teardowns date quickly and cite each other. Standards should be fetched from the primary text and cited by criterion number — that precision is what later lets an accessibility lead verify the load-bearing claims in minutes rather than redoing the research.

Narration for this slide

Every source type has a bias, and the cross-checking only works on the sources you gave it. Marketing pages are claims to verify, never evidence — record them verbatim, then check them against the walkthrough. Docs and changelogs are solid for what a product does, silent on how well it works. Reviews surface real friction, but they skew towards the angry and the recent. Published teardowns are useful and often cite each other in a circle, so check the dates. And for standards, fetch the primary text and cite criteria by number. Two limits to remember: cross-checking cannot catch a claim that is wrong everywhere it appears, and the public web underrepresents anything paywalled, internal, or not in English.

Slide 10 of 1316:9

Good and bad cells in the same matrix

A matrix cell is useful when someone can open the evidence behind it and see the same thing.

Bad cellGood cell
Competitor A has the best onboarding in the categoryCompetitor A: 4 steps from signup to a populated template; invite deferred until after first project (manifest dated June 2026)
Competitor B offers AI-powered project setupCompetitor B claims AI-powered setup (URL, verbatim); walkthrough shows a template picker with no generation step — claim marked contradicted
All competitors offer transparent pricing3 of 8 route teams above 25 seats to a sales form despite a published per-seat price; screenshots in evidence
Pricing for Competitor F: roughly $30 per seat (estimated)Competitor F pricing: not collected — region-locked, trial requires a sales call; recorded as a go-to-market observation

The two recurring failures: laundering marketing claims into observed facts, and filling cells the evidence does not cover because empty looks unfinished.

Slide notes

Each row of this table is a real failure pattern from the source workflow's quality comparison, and they are worth reading as a pair exercise: what makes the left cell unusable is not that it is wrong, it is that it cannot be checked. 'Best onboarding in the category' is a judgment dressed as a finding; the good version gives the step count, the deferral detail, and the dated manifest, and lets the reader form the judgment. The AI-setup row is the cross-check in action — the claim is recorded verbatim with its URL, the walkthrough evidence contradicts it, and both sit in the matrix so the team's planning discussion stays grounded in what users will actually meet.

The transparent-pricing row shows the value of walking flows rather than reading pages: the published per-seat price and the sales-form redirect above twenty-five seats are both true, and only the walkthrough exposes the gap between them. The final row is the one that makes reviewers uncomfortable and is the most important: an estimated price typed into a cell is fabricated evidence, however plausible. 'Not collected', with the reason, is more useful — it is accurate, and the reason is itself a finding about how that competitor sells.

When reviewing an agent-built matrix, sample cells against their evidence files rather than reading the matrix as prose. Three to five cells per competitor is usually enough to know whether the run can be trusted, and it is the same spot-checking habit as opening citations in a deep research report.

Narration for this slide

Here is what quality looks like at the level of a single matrix cell. On the left, cells that read fine and cannot be checked: best onboarding in the category, AI-powered setup, transparent pricing, an estimated price. On the right, the same cells done properly: a step count with dated screenshots, a verbatim claim marked contradicted because the walkthrough showed a template picker, three of eight competitors routing larger teams to sales despite a published price, and — the uncomfortable one — not collected, with the reason recorded. Two failures cause almost all of the bad cells: marketing claims laundered into observed facts, and empty cells filled in because empty looks unfinished. An honest empty cell beats a confident guess every time.

Slide 11 of 1316:9

Worked example: three competitors, one job-to-be-done

From a project-management onboarding teardown: the job is "get a new team from signup to its first working project". Three of the five torn-down products, one dimension each shown here.

Competitor ACompetitor BCompetitor C
Steps to first meaningful action4 — opens into a populated template7 — empty board, invite prompt at step 211 — setup wizard, account details first
Team inviteDeferred until after first projectRequired during signupOptional, buried in settings
Positioning claim vs walkthrough"Set up in minutes" — supported"AI-powered setup" — contradicted: relabelled template picker"Built for teams" — unverified from a solo trial
What the second teammate seesThe populated projectA bare board, no orientationA bare board, no orientation

The gap nobody owned — the second teammate's first run — became the focus of the redesign. The matrix found it; the team ranked it.

Slide notes

This is a condensed view of the teardown traced in the source workflow: five project-management tools torn down in an afternoon ahead of an onboarding redesign, with one collection agent per competitor and the human time spent on verification and on arguing about what the matrix meant. The slide shows three of the competitors and four dimensions so the structure is readable in a 16:9 frame; the full matrix had every competitor as a column and every dimension from the rubric as a row, with each cell citing its evidence files.

Walk the rows. Step counts ranged from four to eleven across the full set, and the two shortest paths both opened into a populated template rather than an empty state — an observation, dated and screenshotted. The cross-check earned its keep on positioning: two competitors claimed AI-powered setup, and the walkthroughs showed one generated a real project structure from a text prompt while the other had relabelled its template picker. Both the claims and the observations went into the matrix verbatim, which kept the planning discussion grounded.

The finding that shaped the redesign came from the gaps file, not from any single cell: every product designed onboarding for the account creator and dropped invited teammates into a bare board with no orientation. That is the layered discipline working as intended — the observation is the bare board across the set, the inference is that the category has not designed for the second teammate, and the recommendation to make that the focus of the redesign was the team's call, made knowing it could equally have been a graveyard nobody had bothered to fix. The archived evidence became the baseline for a re-run two quarters later.

Narration for this slide

Let's trace a real run. The job-to-be-done: get a new team from signup to its first working project. Five project-management tools were torn down; here are three of them on four dimensions. Competitor A: four steps, opens into a populated template, invites deferred. Competitor B: seven steps and an AI-powered setup claim that the walkthrough contradicted — it was a relabelled template picker. Competitor C: eleven steps through a setup wizard. Then the row that mattered: what the second teammate sees. Across the whole set, invited teammates landed on a bare board with no orientation. That gap became the focus of the redesign. Notice the division of labour — the matrix found the gap, and the team decided it was an opportunity rather than a graveyard.

Slide 12 of 1316:9

Exercise: a single-flow teardown of one competitor

One competitor, one flow, one hour. Small enough to finish, real enough to expose where your evidence discipline holds.

  • Pick one competitor and one flow tied to a decision you actually face — onboarding, pricing, or one core task
  • Write a five-line dimensions note: what evidence the flow requires, and what counts as not collected
  • Capture the evidence: ordered screenshots, step count, the positioning claim for that flow, verbatim with URL
  • Write the findings in three labelled layers: observations with file references, inferences, one recommendation
  • Mark the claim as supported, contradicted, or unverified — and note one thing you could not see from outside

Keep the folder and the dimensions note. Module 3 reuses this evidence-first habit on a different corpus: interviews and tickets.

Slide notes

The exercise deliberately shrinks the workflow to a size one person can finish in about an hour without any orchestration: one competitor, one flow, evidence captured by hand. The point is not the matrix — with a single competitor there is nothing to compare — it is the habits: fixing the dimensions before capturing anything, recording the positioning claim verbatim and then checking it against the walkthrough, and writing the output in three labelled layers. Most people discover that the observation layer is easy and the discipline breaks at the inference layer, where the temptation to write conclusions in the voice of facts is strong.

Steer participants towards a flow connected to a live decision, because the exercise is also a test of the scoping slide: if they cannot name the decision the teardown serves, the flow they pick will be whichever one is easiest to screenshot. The 'one thing you could not see from outside' prompt matters too — it seeds the limits conversation, since usage data, retention, and the reasons behind a competitor's choices are invisible from a walkthrough.

If running this with a team, have two people tear down the same flow independently and compare folders. The differences are instructive: where the evidence agrees, the rubric worked; where the inferences differ from the same evidence, that is the judgment layer showing itself — which is exactly the part that should not be delegated to an agent.

Narration for this slide

Time to run this at small scale. Pick one competitor and one flow — onboarding, pricing, or a core task — tied to a decision you actually face. Write a five-line dimensions note first: what evidence this flow requires, and what counts as not collected. Then capture it: ordered screenshots, the step count, and the positioning claim for that flow, verbatim, with the URL. Write your findings in three labelled layers — observations that point at files, inferences marked as inferences, and one recommendation. Finally, mark the claim: supported, contradicted, or unverified, and note one thing you could not see from the outside. Keep the folder. The same evidence-first habit carries straight into the next module.

Slide 13 of 1316:9

Summary, and the bridge to synthesis

  • Scope before you run: the decision, the evidence standard, what would change your mind, and angles you chose
  • Deep research reads what was written; teardowns walk the products — substitute one for the other and you inherit its blind spots
  • Fix the rubric first; every matrix cell traces to a dated evidence file, and "not collected" stays honest
  • Keep observation, inference, and recommendation in separate, labelled layers
  • The matrix maps what exists; whether a gap is an opportunity or a graveyard is a design judgment

Module 3 turns from the public web to your own data: synthesising interviews, tickets, and open-text feedback into themes that trace back to quotes.

Slide notes

Recap by connecting the bullets back to the module's three objectives. Scoping was the first: the question, the decision, the evidence standard, and self-chosen angles are what stop agent breadth becoming unfocused accumulation. The teardown pipeline and the rubric were the second: evidence captured per product against fixed dimensions, dated manifests, marketing claims cross-checked against walkthroughs. The layered output — observation, inference, recommendation — was the third, and it is the discipline that travels furthest beyond this module, because it applies to every research artifact an agent helps produce.

Restate the limits once more, because they are easy to lose in the enthusiasm of a good matrix: desk research and teardowns describe the category and the public web, not your users and not performance. They cannot say why a competitor built something or whether it works for them, and the evidence decays within a quarter. The packet informs the decision; it is not the decision.

Preview Module 3 concretely. The next module keeps the same packet structure and the same evidence discipline but changes the corpus: interviews, support tickets, reviews, and open-text feedback — your own data rather than the public web. The new problems are different in character: consent and privacy in the corpus, themes that must trace back to specific quotes, counting prevalence rather than loudness, and the interpretation pass that stays human. The habit built here — every claim points at evidence someone can open — is the one that carries over directly.

Narration for this slide

Let's close the module. Scope before you run: name the decision, the evidence standard, what would change your mind, and choose the angles yourself. Know which kind of research you are doing — deep research reads what has been written, a teardown walks the products — and do not let one stand in for the other. Fix the rubric before collecting, make every cell trace to dated evidence, and let not collected stay empty. Keep observation, inference, and recommendation visibly separate. And remember what the matrix cannot do: it maps what exists, but deciding which gaps are opportunities is your judgment. In Module 3 we turn from the public web to your own data — synthesising interviews, tickets, and feedback into themes that trace back to quotes. See you there.

Module transcript
Module 2, narrated slide by slide

Slide 1Deep Design Research and Competitive Teardowns

Welcome to Module 2. In the last module we built the research packet — the question, the method, the findings, and the evidence trail, all in one auditable structure. Now we fill that structure with the two kinds of research agents do at a scale a single researcher cannot: deep desk research and competitive teardowns. The headline is on the slide. Coverage is cheap now — an agent can read forty sources or walk eight competitor products in an afternoon. Judgment about that coverage is not cheap, and it does not transfer to the agent. This module is about scoping the work, capturing real evidence, and keeping what you observed visibly separate from what you are inferring.

Slide 2Scope the question: what would change a decision

Everything starts with the question, and the test for a good one is simple: what decision would change because of the answer? Name that decision. Name the products, the moment in the experience, and the user. Then set the evidence standard up front — for example, marketing copy alone does not count as evidence of what a product does. And write down what would change your mind, because if nothing could, you are not researching, you are collecting support for a decision already made. Finally, choose the research angles yourself — usually one per competitor, one for the standard, one for criticism, one for recency. Five to eight angles. The agent supplies the breadth. You supply the aim.

Slide 3Two kinds of competitive research

There are two ways to research competitors, and they produce different artifacts. Deep desk research reads what has been written — articles, documentation, reviews, standards — and synthesises it into a cited report. A teardown walks the products themselves: it signs up, clicks through onboarding, screenshots the pricing page, and counts the steps. Here is why the difference matters. A literature search will tell you a competitor is loved for its onboarding. Only a teardown tells you it is a three-step flow that defers account creation until after the first project exists. You need both, and the most common mistake is substituting one for the other — usually letting published opinion stand in for what the product actually does.

Slide 4Decide what evidence counts before collecting any

Before anyone collects anything, fix what evidence counts. This is the dimensions file, and it is the rubric every competitor gets measured against. For onboarding: ordered screenshots from signup to the first meaningful action, with step counts. For pricing: the full page, the tiers, what is gated where. For positioning: the headline and the three claims the marketing leans on hardest, verbatim, with URLs. For core flows and empty states: what the product shows before any data exists. And for every marketing claim, the cross-check — did the walkthrough support it, contradict it, or fail to verify it? One more rule: anything that could not be collected gets recorded with the reason. Never guessed. Not collected is a finding.

Slide 5The teardown pipeline

Here is the whole teardown as a pipeline. You define the set: the decision, five to eight competitors, and the evidence rubric. Then agents take over — one collection agent per competitor, each filling an evidence folder with screenshots, pricing pages, and verbatim claims, against the same rubric. The findings get normalised: every cell cites a file, marketing claims get checked against what the walkthrough actually showed. Then synthesis: the comparison matrix, the named patterns, the opportunity gaps. And then it comes back to you — sample the evidence, separate what was seen from what is being inferred, and decide which gaps are worth pursuing. The dashed line is the re-run: next quarter, the same pipeline becomes a diff against the archive.

Slide 6Screenshot evidence and the rights questions

Screenshots are the evidence that settles arguments — but capturing competitor products comes with rules worth setting up front. Use trial accounts and public pages, stay inside the terms of service, and never misrepresent who you are to get access. Keep what you capture as internal evidence: reference it, link matrix cells to it, but do not republish competitor UI in decks or articles — use your own concept work for anything external. Date every capture, because pricing pages and onboarding flows change without notice, and undated evidence cannot be diffed later. And if a product is hidden behind a sales call, write that down as a finding about how they sell. It is not a gap you should fill some other way.

Slide 7Pattern surveys: table stakes vs differentiating

With evidence in the folders, the cross-competitor read begins, and the useful frame is table stakes versus differentiators. Table stakes are the patterns nearly everyone ships — setup checklists, template galleries, three pricing tiers with the middle one anchored. Nobody praises them; their absence is what gets noticed. Differentiators are the patterns one or two products own, and the evidence needs to show how they execute, not just that the feature exists. Count honestly: four of five products show a checklist on first run, with citations — not, most products do this. And remember the limit. A pattern the whole category shares might be best practice, or it might be a herd error. Structure is not performance, and the matrix cannot tell you which is which.

Slide 8Observation, inference, recommendation

Here is the discipline that keeps the whole module honest. Three kinds of statement, kept visibly separate. Observations are what the evidence shows — competitor A takes four steps from signup to a populated template, and here are the screenshots. Anyone can verify them. Inferences are readings of what the observations mean — short paths to first value seem to go with deferring team invites. Reasonable, but a step away from the evidence, so label it. Recommendations are what we should do about it — and they belong to you, the designer, because you carry the accountability. When these layers blur, marketing claims become facts and hunches become findings. Keep the labels on, and make the agent keep them on too.

Slide 9Source quality: every source type has a bias

Every source type has a bias, and the cross-checking only works on the sources you gave it. Marketing pages are claims to verify, never evidence — record them verbatim, then check them against the walkthrough. Docs and changelogs are solid for what a product does, silent on how well it works. Reviews surface real friction, but they skew towards the angry and the recent. Published teardowns are useful and often cite each other in a circle, so check the dates. And for standards, fetch the primary text and cite criteria by number. Two limits to remember: cross-checking cannot catch a claim that is wrong everywhere it appears, and the public web underrepresents anything paywalled, internal, or not in English.

Slide 10Good and bad cells in the same matrix

Here is what quality looks like at the level of a single matrix cell. On the left, cells that read fine and cannot be checked: best onboarding in the category, AI-powered setup, transparent pricing, an estimated price. On the right, the same cells done properly: a step count with dated screenshots, a verbatim claim marked contradicted because the walkthrough showed a template picker, three of eight competitors routing larger teams to sales despite a published price, and — the uncomfortable one — not collected, with the reason recorded. Two failures cause almost all of the bad cells: marketing claims laundered into observed facts, and empty cells filled in because empty looks unfinished. An honest empty cell beats a confident guess every time.

Slide 11Worked example: three competitors, one job-to-be-done

Let's trace a real run. The job-to-be-done: get a new team from signup to its first working project. Five project-management tools were torn down; here are three of them on four dimensions. Competitor A: four steps, opens into a populated template, invites deferred. Competitor B: seven steps and an AI-powered setup claim that the walkthrough contradicted — it was a relabelled template picker. Competitor C: eleven steps through a setup wizard. Then the row that mattered: what the second teammate sees. Across the whole set, invited teammates landed on a bare board with no orientation. That gap became the focus of the redesign. Notice the division of labour — the matrix found the gap, and the team decided it was an opportunity rather than a graveyard.

Slide 12Exercise: a single-flow teardown of one competitor

Time to run this at small scale. Pick one competitor and one flow — onboarding, pricing, or a core task — tied to a decision you actually face. Write a five-line dimensions note first: what evidence this flow requires, and what counts as not collected. Then capture it: ordered screenshots, the step count, and the positioning claim for that flow, verbatim, with the URL. Write your findings in three labelled layers — observations that point at files, inferences marked as inferences, and one recommendation. Finally, mark the claim: supported, contradicted, or unverified, and note one thing you could not see from the outside. Keep the folder. The same evidence-first habit carries straight into the next module.

Slide 13Summary, and the bridge to synthesis

Let's close the module. Scope before you run: name the decision, the evidence standard, what would change your mind, and choose the angles yourself. Know which kind of research you are doing — deep research reads what has been written, a teardown walks the products — and do not let one stand in for the other. Fix the rubric before collecting, make every cell trace to dated evidence, and let not collected stay empty. Keep observation, inference, and recommendation visibly separate. And remember what the matrix cannot do: it maps what exists, but deciding which gaps are opportunities is your judgment. In Module 3 we turn from the public web to your own data — synthesising interviews, tickets, and feedback into themes that trace back to quotes. See you there.