AAgentic Design School
Module 3 of 6
40–50 minutes

Design Systems for Agents

Component Taste and Review Gates

Agents can produce components quickly; the system survives only if taste decisions stay deliberate. This module covers what to encode as rules, what to keep as human review, and how a contribution gate works when an agent is the contributor.

Duration40–50 minutes

Slides13 slides with notes and narration

Learning objectives

  • Separate mechanical checks from taste decisions in component review.
  • Design a review gate for agent-built components, including evidence requirements.
  • Encode recurring review feedback as rules so it stops recurring.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Component Taste and Review Gates

Design Systems for Agents · Module 3 of 6

  • Speed is easy now; taste is the constraint
  • What to encode as rules, what to keep as human review
  • Evidence requirements for agent-built components
  • A contribution gate that works when the contributor is an agent

The system does not survive on the quality of its best component. It survives on the quality of the worst one that got in.

Slide notes

Modules 1 and 2 made the system legible: tokens that carry intent, and a DESIGN.md the agent loads at the start of every session. The result is that producing a plausible component now takes minutes. This module is about the consequence of that speed: the bottleneck moves from production to judgment, and if the team does not decide deliberately where judgment sits, components enter the system on momentum rather than on merit.

Frame the module around one division of labour. Some review work is mechanical — token usage, type safety, contrast, state coverage — and an agent can check its own output against it before a human ever looks. Some review work is taste — density, hierarchy, restraint, voice — and that has to stay with people, but it gets dramatically cheaper when the mechanical layer is already clean and the evidence is laid out in front of the reviewer.

Set expectations on scope. This module covers the gate for individual component changes and the intake pipeline for new component proposals. It does not cover system-wide audits or drift detection — that is Module 4 — and it does not cover the fully scheduled maintenance loop, which is Module 6. The gate built here is the unit those later modules reuse.

Narration for this slide

Welcome back. In the first two modules we made the system legible — tokens that carry intent, and a DESIGN.md the agent reads every session. The payoff is speed: an agent can now produce a plausible component in minutes. This module is about the catch. Speed is easy; taste is the constraint. If every plausible component gets in, the system stops being a system. So we are going to separate the review work an agent can do on its own output from the judgment that has to stay human, and build a review gate that holds when the contributor is an agent. Let's start by pinning down what taste actually means.

Slide 2 of 1316:9

What taste means operationally

Taste sounds like preference. In a design system it is a small set of judgments you can name, even when you cannot fully automate them.

  • Density — how much the component asks of the screen, and whether that fits the product
  • Hierarchy — whether the component reveals information in the order the task needs
  • Restraint — whether it adds the minimum new surface area: props, variants, visual ideas
  • Voice — whether labels, empty states, and errors sound like the product, not like a template

If you cannot name the dimension a taste objection sits on, the agent cannot learn from it and the next reviewer cannot apply it.

Slide notes

The word taste does a lot of work in design system conversations and most of it is unhelpful, because it gets used to mean both I have a standard I have not written down and I prefer it the other way. Operationally, taste in component review reduces to a handful of nameable dimensions. Density: agents trained on the public web default to airy, marketing-grade spacing, and a work tool usually wants the opposite. Hierarchy: whether the component's internal order matches the user's task order, which no linter can verify. Restraint: the strong default of generative tools is to add — another prop, another variant, another visual flourish — and the system's job is mostly to refuse. Voice: copy inside components drifts towards generic template language unless someone holds it to the product's register.

The reason to name the dimensions is not philosophical tidiness. A finding that says this feels off cannot be acted on by an agent, cannot be checked by the next reviewer, and cannot ever become a rule. A finding that says the summary row is denser than anything else in the system, reduce it to one line of metadata can be acted on, and after the second occurrence it can be written into DESIGN.md.

Be honest that the dimensions are judged, not measured. Two competent reviewers can disagree on density for a specific component. That is fine — the gate later in this module routes those disagreements to a named decision-maker rather than pretending a rule can settle them.

Narration for this slide

Taste gets treated as something mystical, which makes it impossible to manage. Operationally, it is four judgments you can name. Density: does the component ask the right amount of the screen for this product? Hierarchy: does it reveal information in the order the task needs? Restraint: does it add the minimum new surface area — or does it bring three new props and a gradient nobody asked for? And voice: do its labels and empty states sound like your product or like a template? You cannot fully automate any of these. But naming them matters, because a taste objection you cannot name is one the agent can never learn from — and one the next reviewer cannot apply consistently.

Slide 3 of 1316:9

Mechanical checks vs taste decisions

The split decides who does the work. Mechanical checks run the same way every time; taste decisions need a person and a point of view.

Mechanical checksTaste decisions
ExamplesToken usage, type safety, contrast, state coverage, naming patternDensity, hierarchy, restraint, voice, whether it should exist at all
Who runs itThe agent, on its own output, before reviewA reviewer, against DESIGN.md and the product
OutputPass or fail, with the violated rule citedFindings with severity, evidence, and a recommended change
When it can changeWhenever a rule is added or tunedSlowly — a taste shift is a system decision, not a review comment
Failure modeStale or missing rules let violations through quietlyReviewer preference dressed up as a standard

Anything a script can verify should never reach a human reviewer as a finding. Anything a script cannot verify should never be blocked by one.

Slide notes

This table is the module's first learning objective in one view. The left column is everything that can be expressed as a checkable rule: every colour resolves to a token, every prop is typed, contrast meets AA, the required interaction states exist, the name follows the system's pattern. These checks are cheap to run, they run identically on every proposal, and an agent can run them on its own output before a human ever sees the work. The right column is the judgment work from the previous slide, plus the largest judgment of all: whether the component should exist in the system at all.

The two failure modes at the bottom deserve attention because they are the ones teams actually hit. On the mechanical side, the rules go stale — the token file moves, a new state becomes required — and the checks keep passing while the standard quietly drops. On the taste side, the failure is a reviewer enforcing personal preference as if it were a system standard, which erodes contributor trust faster than any backlog does. The cure for the first is owning the checks like code. The cure for the second is requiring that taste findings cite a dimension and, where one exists, the line of DESIGN.md they rest on.

The highlight line is the operating principle: humans should not spend review time catching hardcoded hex values, and scripts should not be allowed to block work on grounds they cannot actually evaluate.

Narration for this slide

Here is the split that decides who does the work. Mechanical checks are everything a script can verify: tokens, types, contrast, state coverage, naming. They run the same way every time, and the agent should run them on its own output before any person looks. Taste decisions are the judgments from the last slide — density, hierarchy, restraint, voice — plus the biggest one: whether this component should exist at all. Those need a person. The two failure modes are worth remembering: mechanical rules go stale and pass things they should not, and taste review degrades into personal preference wearing a standards badge. The principle: never make a human catch what a script can catch, and never let a script block what only a human can judge.

Slide 4 of 1316:9

Mechanical checks the agent runs on itself

These run before the change is even proposed. A failure goes back to the agent, not onto a reviewer's plate.

  • Token audit — every colour, space, radius, and type size resolves to a semantic token
  • Type check, lint, and build — no invented props, no broken exports
  • State coverage — hover, focus, disabled, loading, empty, and error states exist and render
  • Accessibility pass — contrast, focus visibility, labels, and hit targets on the rendered states
  • Naming and file placement — follows the system's pattern, lands in the right package

The agent reporting which checks it ran, and what failed and was fixed, is itself part of the evidence packet.

Slide notes

This list is the concrete content of the left-hand column from the previous slide, and it doubles as a starting checklist participants can lift directly. The token audit is usually the highest-value single check for design systems: in the traced runs behind this course's articles, hardcoded values are the most common defect in unconstrained agent output, and they are also the defect a script catches with essentially no false positives. Type check, lint, and build catch the invented prop and the broken import — boring failures that waste reviewer goodwill when they reach a human.

State coverage is the check teams most often forget to make explicit. An agent asked for a card builds the happy path; the empty, loading, and error states only appear when something demands them. Making their existence a mechanical requirement — they must exist and render, even though their quality is still judged at taste review — moves a whole class of back-and-forth out of human review. The accessibility pass at this stage is the automatable layer: contrast, focus visibility, labels, target sizes. It is a floor, not a substitute for the deeper accessibility judgment that belongs to the system's standards.

The closing point matters for trust: the agent should report what it ran and what it fixed. A proposal that arrives saying token audit clean, axe clean, one contrast failure found and fixed on muted text reads very differently from one that arrives silent — and that self-report becomes part of the evidence the next gate consumes.

Narration for this slide

Here is the checklist the agent runs on its own output, before anyone reviews anything. A token audit — every colour, space, and type size resolves to a semantic token, no raw values. Type check, lint, and build — no invented props, nothing broken. State coverage — hover, focus, disabled, loading, empty, and error states exist and render. An automated accessibility pass over those rendered states. And naming and file placement that follow the system's conventions. Failures go back to the agent, not onto a reviewer's plate. And the agent reports what it ran and what it fixed — that report is part of the evidence, because a clean self-check is the price of admission to human review.

Slide 5 of 1316:9

Evidence before opinion: the entry requirement

Nothing reaches a human reviewer without an evidence packet. Review without evidence collapses into opinion, and opinion does not scale.

  • Screenshots at fixed widths — 390, 768, and 1440px, captured from the rendered component
  • Every interaction state pictured, not just the default
  • The tokens consumed and the props exposed, listed in the proposal
  • An overlap statement: the nearest existing components and why they fall short
  • The agent's self-check report: what ran, what failed, what was fixed

If the evidence is not attached, the review does not start. That rule protects reviewers more than it polices agents.

Slide notes

The evidence packet is the hinge of the whole gate, and it is also the part agents are genuinely good at producing. Capturing screenshots at fixed widths, enumerating states, listing the tokens consumed and the props exposed — this is mechanical assembly work that used to be the reason component reviews arrived half-documented. When the contributor is an agent, there is no excuse for a thin packet, because assembling it costs the agent minutes.

The overlap statement deserves emphasis because it is the cheapest component decision available: the one you do not build. Requiring every proposal to name the nearest existing components, and say specifically why they fall short, forces the comparison to happen before review rather than during it. In the contribution intake workflow this course draws on, overlap analysis is its own stage run by a dedicated agent, and the most common outcome across real backlogs is a redirect to an existing composition rather than a new component.

Fixed widths matter more than they look. Reviewing whatever viewport the contributor happened to capture makes findings impossible to compare across proposals; reviewing the same three widths every time makes density and hierarchy judgments consistent and makes regressions visible later. Make the entry requirement explicit and unapologetic: no packet, no review. Reviewers are the scarce resource the gate exists to protect.

Narration for this slide

The entry requirement for human review is evidence, not enthusiasm. Every proposal arrives with screenshots at three fixed widths, every interaction state pictured — not just the default — the tokens it consumes and the props it exposes, an overlap statement naming the nearest existing components and why they fall short, and the agent's own self-check report. When the contributor is an agent, assembling this packet costs minutes, so there is no excuse for a thin one. And the rule is blunt: if the evidence is not attached, the review does not start. That is not bureaucracy. It is what keeps review grounded in what the component actually does, rather than in opinions about what it probably does.

Slide 6 of 1316:9

The review gate, end to end

The path of one agent-proposed component change, from branch to decision, with the work split between agent, automation, and people.

Pipeline diagram of contribution review gates. An agent proposes a component change on a branch and assembles an evidence packet of screenshots, states, and token usage. The change passes through automated checks — token audit, type check, lint, accessibility — whose failures return to the agent. It then reaches taste review, a human-led gate judging density, hierarchy, restraint, and voice against DESIGN.md, producing findings with severity and evidence. The system maintainer decides: accept, accept narrowed, redirect to composition, or decline, with reasoning recorded. Recurring findings are written into rules and checks, and a dashed line returns to the start: the next proposal meets a stricter gate.
Yellow steps are agent-run, the ink stripe is the automated gate, blue is human-led review, and green is the maintainer's decision. The dashed return is the point of the loop: recurring findings become rules, so the next proposal meets a stricter gate.

The gate is not there to slow agents down. It is there so the speed of production never outruns the deliberateness of what enters the system.

Slide notes

Walk the diagram in order and name the owner of each step. The agent proposes the change on a branch, scoped to one component, and states the need it answers — a proposal that is only a diff is not yet a proposal. The agent then assembles the evidence packet from the previous slide. The automated checks are the first gate, and their failures loop straight back to the agent without consuming any human attention; that loop can run several times and nobody needs to care. Taste review is the human-led gate: density, hierarchy, restraint, and voice, judged against DESIGN.md rather than against the reviewer's mood, returned as findings with severity and evidence rather than as silent fixes.

The maintainer decision is deliberately separated from taste review. The reviewer recommends; the person accountable for the system's surface area decides, choosing between accept, accept with a narrowed scope, redirect to an existing composition, or decline — and records the reasoning in the packet so the decision survives staff turnover. On small teams the reviewer and the maintainer are the same person wearing two hats; keep the steps distinct anyway, because the reasoning record is what makes a decline land as governance rather than gatekeeping.

The dashed return edge is the slide's real argument. A gate that only filters is a cost centre. A gate that converts its recurring findings into rules and checks gets cheaper every round, which is where the next two slides go.

Narration for this slide

Here is the whole gate in one picture. The agent proposes the change on a branch and assembles the evidence packet. Automated checks run first — and their failures go back to the agent, not to a person. What survives reaches taste review: density, hierarchy, restraint, and voice, judged against DESIGN.md, returned as findings with severity and evidence. Then the maintainer decides — accept, accept narrowed, redirect to an existing composition, or decline — and records the reasoning. Notice the dashed line back to the start. Recurring findings get written into rules and checks, so the next proposal meets a stricter gate. The gate is not there to slow agents down. It is there so production speed never outruns deliberateness.

Slide 7 of 1316:9

Intake when the contributor is an agent

New component proposals get the same staged investigation whether a product team or an agent raised them — and agents can run most of the investigation.

  • Proposals describe the need, not the solution — a finished component is not an argument for itself
  • Overlap stage: is this covered by an existing component, a composition, or a small extension?
  • Impact stage: how many surfaces in the codebase show this need today, counted with file paths
  • Spec stage: draft API, tokens consumed, states, and accessibility requirements
  • The packet goes to a named decision-maker; the pipeline recommends, it never decides

The expensive part of a good contribution decision was always the research. Agents make the research cheap; the decision stays exactly as human as it was.

Slide notes

The previous slides covered changes to existing components. This one covers the harder case: a proposal for something new, which is where systems either grow by accretion or earn a reputation for saying no. The intake pipeline this course draws on runs in stages because each stage can end the process early. The overlap stage asks whether the need is already covered by an existing component, a documented composition, or a small extension — and a proposal that is fully covered exits with a worked example attached instead of a new component. The impact stage estimates adoption from the codebase rather than from the proposer's optimism: it counts the surfaces where the need already shows up as workarounds or hand-rolled near-duplicates, with file paths as evidence. The spec stage drafts what the thing would actually be — props, tokens, states, accessibility requirements — as something for the maintainers to argue with.

When the contributor is an agent, two things change. First, the proposal template matters more, because agents are happy to present a finished component as if its existence were the argument; requiring the need to be stated separately from the solution keeps the overlap check honest. Second, the volume goes up — an agent can raise five proposals in the time a product team raised one — which is exactly why the investigation has to be cheap and consistent rather than heroic.

Keep the boundary clear: the pipeline produces evidence and a labelled recommendation. Strategic scope, brand disputes, and appetite for new surface area belong to the people accountable for the system. A council that rubber-stamps the recommendation has outsourced governance to a script.

Narration for this slide

What about proposals for entirely new components? They get a staged investigation, and agents can run most of it. Proposals describe the need, not the solution — a finished component is not an argument for itself. An overlap stage checks whether the need is already covered by an existing component, a composition, or a small extension; the cheapest component is the one you never build. An impact stage counts how many surfaces in the codebase actually show this need today, with file paths, not optimism. A spec stage drafts the API, tokens, states, and accessibility requirements. Then the packet goes to a named decision-maker. The research used to be the expensive part — agents make it cheap. The decision stays exactly as human as it always was.

Slide 8 of 1316:9

What a reviewable finding looks like

Taste findings borrow the discipline of critique: severity, evidence, impact, and the smallest useful change — never a silent rewrite.

A taste finding from component review
Important — Density breaks the table pattern

Dimension: density (DESIGN.md, "data surfaces stay compact")
Evidence: 1440px screenshot — row height 72px vs 48px in
  every existing table; metadata wraps to a second line.
User impact: 40% fewer rows visible on the standard
  dashboard viewport than the table it replaces.
Smallest change: single-line metadata, 48px rows,
  move the secondary actions into the overflow menu.
Decision needed: none — covered by an existing rule.

Severity, a named dimension, evidence, impact, and the smallest change. If a finding cannot fill those fields, it is probably a preference.

Slide notes

This format comes straight from the critique discipline covered in the school's article on critique loops, applied to component review. Severity does the triage: blocker, important, polish, and question are enough levels, and more taxonomy makes findings harder to act on, not easier. The named dimension and the DESIGN.md citation are what keep taste review from degrading into preference — if the reviewer cannot say which dimension the objection sits on, the finding goes into the question bucket and triggers a conversation rather than a demand.

The evidence field points at the packet from slide five: a specific screenshot, a specific width, a measurable difference against the existing system. The impact field forces the finding to be about users or about system coherence rather than about the reviewer's eye. The smallest-change field is what makes the finding actionable by an agent in the revision pass — and constrains that pass, because the standing rule is that revision addresses approved findings only and does not introduce a new direction.

The last line, decision needed, is the routing field. Most findings are covered by an existing rule or an established pattern and need no escalation. The ones that genuinely need a judgment call — this component wants a density the system has never allowed, do we want that — get routed to the maintainer explicitly instead of being settled by whoever reviewed the work that day.

Narration for this slide

Here is what a taste finding looks like when it is written to be acted on. It carries a severity — this one is important, not a blocker. It names the dimension, density, and cites the line of DESIGN.md it rests on. It points at evidence: a specific screenshot at a specific width, with a measurable difference from every existing table. It states the user impact — forty per cent fewer rows on the standard viewport. And it proposes the smallest useful change, which is what the agent's revision pass will be scoped to. If a finding cannot fill those fields, it is probably a preference — and preferences are conversations, not gates.

Slide 9 of 1316:9

Feedback that stops recurring

Every finding made twice is a candidate for promotion. The destination depends on what kind of rule it can become.

  • Checkable and universal → an automated check: token audit rules, required states, contrast
  • Judged but stable → a line in DESIGN.md or an anti-pattern entry the reviewer can cite
  • Procedural → a review or contribution skill the agent loads when the task matches
  • File-type specific → a path-scoped rule that only costs context where it applies
  • Still contested → not a rule yet; route it to the maintainer as a system decision

A review gate that never changes the rules is just a queue. The gate earns its cost by making itself less necessary.

Slide notes

This is the third learning objective and the part most teams skip, because closing the loop feels like extra work at exactly the moment the review is finished. The discipline is simple: when the same finding appears a second time, it stops being feedback and becomes a candidate for promotion, and the only question is where it goes. Findings that can be checked mechanically and apply everywhere become automated checks — that is how the token audit and the required-states check were born in most systems that have them. Findings that are judged but stable — the product prefers compact data surfaces, no decorative gradients, no new hues without approval — become lines in DESIGN.md or named anti-patterns, which gives future reviewers something to cite and future agents something to load.

Procedural feedback — the steps a good component proposal goes through, the order of a review — belongs in a skill, because skills load on demand and do not tax every session's context. Constraints that only matter for some file types belong in path-scoped rules. And feedback that is still contested is not a rule yet: writing a disputed preference into the harness just moves the argument into the files, where it is harder to see. Route those to the maintainer as an explicit system decision.

The payoff compounds. Each promotion removes a class of finding from future human review, which is what makes the gate sustainable as agent-driven contribution volume grows. This is also the seam into Module 6: a system that maintains itself is mostly a system that has been promoting its review feedback for long enough.

Narration for this slide

The gate earns its cost by making itself less necessary. Every finding made twice is a candidate for promotion — the only question is where it goes. If it is checkable and applies everywhere, it becomes an automated check. If it is judged but stable — compact data surfaces, no decorative gradients — it becomes a line in DESIGN.md that reviewers can cite and agents will load. If it is procedural, it becomes a skill. If it only applies to certain file types, it becomes a path-scoped rule. And if it is still contested, it is not a rule yet — that one goes to the maintainer as a system decision. A review gate that never changes the rules is just a queue.

Slide 10 of 1316:9

Worked example: rejected, revised, accepted

One agent-proposed metric card, traced through the gate. The checks caught the mechanical defects; the people caught the rest.

StageWhat happenedOutcome
ProposalAgent proposes a MetricCard for a dashboard team, on a branch with an evidence packetEnters the gate
Automated checksTwo hardcoded hex values and a missing focus state; agent fixes both and re-runsPass on second run, no human involved
Taste reviewThree findings: density above the table pattern, a redundant trend prop, template-voice empty stateReturned as critique, not silently fixed
RevisionAgent revises against the three approved findings only; new screenshots attachedRe-enters review
Maintainer decisionAccept narrowed: trend display deferred, reasoning recorded in the packetComponent enters the system
Rules updateDensity rule for data surfaces added to DESIGN.md; empty-state voice added to the review skillBoth findings cannot recur

Total human attention: one taste review and one decision. Everything else was the agent working against the gate.

Slide notes

Walk the table as a timeline and keep drawing attention to who is spending time at each row. The proposal and the evidence packet cost the agent minutes and a human nothing. The automated checks caught exactly the defects they exist to catch — hardcoded values and a missing focus state — and the fix loop ran without a person in it. The interesting work starts at taste review: the density finding cited the table pattern, the restraint finding flagged a trend prop the impact evidence did not justify, and the voice finding pointed at an empty state that read like a template. All three went back as findings with severity and evidence, not as silent fixes, because silent fixes teach the agent nothing and leave no record.

The revision pass was scoped to the three approved findings only — that constraint is what keeps revision from becoming a second, unreviewed design pass. The maintainer accepted with a narrowed scope, deferring the trend display until more than one surface needs it, and recorded that reasoning in the packet, which is what makes the deferral revisitable rather than forgotten.

The last row is the one to linger on. The density rule was promoted to DESIGN.md and the empty-state voice guidance went into the review skill, so neither finding can recur as a surprise. This trace is a composite drawn from the contribution-intake and critique workflows behind this course rather than a single logged session — the shape and the division of labour are what to take from it, not the exact counts.

Narration for this slide

Let's trace one component through the gate. An agent proposes a metric card for a dashboard team — branch, evidence packet, the lot. The automated checks catch two hardcoded hex values and a missing focus state; the agent fixes both and re-runs, no human involved. Taste review finds three things: density above the existing table pattern, a trend prop nothing justified, and an empty state that sounded like a template. Those go back as findings, the agent revises against exactly those three, and the maintainer accepts with a narrowed scope — trend display deferred, reasoning recorded. Then the loop closes: the density rule goes into DESIGN.md, the voice guidance into the review skill. Total human attention: one review and one decision.

Slide 11 of 1316:9

What the gate cannot decide

Encode everything you can. Then be precise about what remains a human call, so it is made deliberately rather than by default.

  • Whether the system should grow into a new area at all — scope is strategy, not review
  • Taste shifts: changing what the system considers good is a decision, not an accumulation of findings
  • Naming and brand disputes — the gate can document them, only people can settle them
  • Deprecations and breaking changes, which spend other teams' time
  • Whether a declined proposal lands as governance or gatekeeping — reasoning and an offered alternative do that work

Rules raise the floor. The ceiling — what the system refuses, what it aspires to, what it retires — stays a human responsibility.

Slide notes

After three slides about encoding feedback into rules, this slide is the counterweight, and it matters because the failure mode of an enthusiastic gate-builder is trying to automate judgments that should stay deliberate. Scope is the clearest case: whether the system should expand into data visualisation, or absorb a pattern one team loves and the brand team distrusts, is a strategic call about surface area and maintenance appetite. No accumulation of intake packets settles it; they only inform it.

Taste shifts are subtler. The rules and DESIGN.md lines built through this module encode the system's current taste, and that is their value — but it also means that changing taste requires changing the rules deliberately, not letting individual reviews drift away from them one exception at a time. Treat a taste change like a breaking change: proposed, discussed, decided, and then encoded. Deprecations and breaking changes belong on this list because their cost lands on other teams; an agent can find every usage and draft every migration note, but the decision to spend other people's time is accountability, not analysis.

The last bullet is about how declines land. The contribution-intake case studies behind this course are consistent on this: a no that arrives quickly, with the reasoning recorded and an alternative attached — usually a composition of existing components — reads as help. A no that arrives slowly and unexplained reads as gatekeeping, and contributors stop proposing, which is its own kind of system failure.

Narration for this slide

One caution before the exercise. Encode everything you can — and then be precise about what stays a human call. The gate cannot decide whether the system should grow into a new area at all; that is strategy. It cannot shift the system's taste; changing what counts as good is a decision you make deliberately, not something that accumulates out of individual reviews. It cannot settle naming or brand disputes, only document them. It cannot decide deprecations or breaking changes, because those spend other teams' time. And it cannot make a declined proposal feel fair — recorded reasoning and an offered alternative do that. Rules raise the floor. The ceiling stays yours.

Slide 12 of 1316:9

Exercise: write your review gate

One page, for your own system, written so that both an agent and a new reviewer could follow it tomorrow.

  • List your mechanical checks: which exist today as scripts, which are still caught by humans
  • Define the evidence packet: widths, states, token and prop listing, overlap statement
  • Name the taste dimensions a reviewer judges, and the DESIGN.md lines each one rests on
  • Name the decision-maker and the four outcomes they can choose between
  • Pick the last three findings your team repeated, and decide where each one gets promoted

The fifth step is the one that pays compound interest. Do it for three findings now, and put a recurring slot in the calendar to do it again.

Slide notes

The exercise produces the artifact this module has been circling: a one-page review gate for the participant's own system. Steer people towards honesty in the first step — most teams discover that several checks they think of as automated are actually a vigilant person, and that gap is the cheapest improvement available, because those checks already have an unwritten specification.

The evidence packet definition should be concrete enough to hand to an agent: exact widths, the named states, what the proposal text must contain. The taste dimensions step usually exposes gaps in DESIGN.md — a dimension the team judges constantly but has never written down — and that is a fine outcome; the missing line is homework from Module 2. Naming the decision-maker matters even on a team of three, because the difference between a recommendation and a decision is what makes declines defensible later.

The fifth step is deliberately small: three recurring findings, each assigned a destination — check, DESIGN.md line, skill, path-scoped rule, or escalation to a system decision. If participants do nothing else from this module, that habit, repeated on a schedule, is the part that compounds. Suggest they keep the page; Module 4 reuses it as the standard an audit measures the existing codebase against, and Module 6 turns it into the approval step of the maintenance loop.

Narration for this slide

Time to build yours. One page, for your own system. List your mechanical checks — and be honest about which ones are actually a script and which are a vigilant person. Define the evidence packet: the widths, the states, the token and prop listing, the overlap statement. Name the taste dimensions a reviewer judges, and the lines of DESIGN.md each one rests on — if a line is missing, that is homework. Name the decision-maker and their four outcomes. Then take the last three findings your team has repeated and decide where each one gets promoted. Keep the page. Module 4 will use it as the standard your audit measures the codebase against.

Slide 13 of 1316:9

Summary, and the bridge to audits

  • Taste is nameable: density, hierarchy, restraint, voice — and the decision of whether to build at all
  • Mechanical checks run agent-side, before review; humans never catch what a script can catch
  • Evidence is the entry requirement: screenshots, states, tokens, overlap, and the self-check report
  • The maintainer decides — accept, narrow, redirect, decline — and records the reasoning
  • Recurring findings get promoted into checks, DESIGN.md, skills, and rules, so the gate gets cheaper

Module 4 turns the same standard outward: auditing the components already in your codebase for drift, with evidence attached to every finding.

Slide notes

Recap by walking the gate one more time, but frame it as a division of labour rather than a process diagram: agents produce and assemble evidence, automation enforces the floor, reviewers judge the named taste dimensions, the maintainer owns the decision, and the loop closes by promoting recurring findings into rules. The module's three objectives map onto that directly — the mechanical-versus-taste split, the gate design with its evidence requirements, and the promotion habit that stops feedback recurring.

The bridge to Module 4 is a change of direction, not of standard. This module pointed the gate at new work arriving into the system. Module 4 points the same standard backwards, at everything already in the codebase: the components built before the gate existed, the tokens that drifted, the patterns that were copied and mutated. The review gate checklist from the exercise becomes the audit's measuring stick, and the evidence discipline — no finding without a file, a line, and a screenshot — carries over unchanged.

If time allows, close on the cultural point: the gate's purpose is not to make contribution harder, it is to make the system's no fast, explained, and accompanied by an alternative — which is what keeps people and agents proposing.

Narration for this slide

Let's close. Taste is not mystical — it is density, hierarchy, restraint, voice, and the call about whether to build at all. Mechanical checks run on the agent's side, before review, so humans never spend attention on what a script can catch. Evidence is the entry requirement, the maintainer owns the decision and records the reasoning, and recurring findings get promoted into checks, DESIGN.md, skills, and rules — so the gate gets cheaper every round. In Module 4 we turn the same standard outward: instead of gating new work, we audit what is already in the codebase, and we attach evidence to every finding. Bring your review gate page with you. See you there.

Module transcript
Module 3, narrated slide by slide

Slide 1Component Taste and Review Gates

Welcome back. In the first two modules we made the system legible — tokens that carry intent, and a DESIGN.md the agent reads every session. The payoff is speed: an agent can now produce a plausible component in minutes. This module is about the catch. Speed is easy; taste is the constraint. If every plausible component gets in, the system stops being a system. So we are going to separate the review work an agent can do on its own output from the judgment that has to stay human, and build a review gate that holds when the contributor is an agent. Let's start by pinning down what taste actually means.

Slide 2What taste means operationally

Taste gets treated as something mystical, which makes it impossible to manage. Operationally, it is four judgments you can name. Density: does the component ask the right amount of the screen for this product? Hierarchy: does it reveal information in the order the task needs? Restraint: does it add the minimum new surface area — or does it bring three new props and a gradient nobody asked for? And voice: do its labels and empty states sound like your product or like a template? You cannot fully automate any of these. But naming them matters, because a taste objection you cannot name is one the agent can never learn from — and one the next reviewer cannot apply consistently.

Slide 3Mechanical checks vs taste decisions

Here is the split that decides who does the work. Mechanical checks are everything a script can verify: tokens, types, contrast, state coverage, naming. They run the same way every time, and the agent should run them on its own output before any person looks. Taste decisions are the judgments from the last slide — density, hierarchy, restraint, voice — plus the biggest one: whether this component should exist at all. Those need a person. The two failure modes are worth remembering: mechanical rules go stale and pass things they should not, and taste review degrades into personal preference wearing a standards badge. The principle: never make a human catch what a script can catch, and never let a script block what only a human can judge.

Slide 4Mechanical checks the agent runs on itself

Here is the checklist the agent runs on its own output, before anyone reviews anything. A token audit — every colour, space, and type size resolves to a semantic token, no raw values. Type check, lint, and build — no invented props, nothing broken. State coverage — hover, focus, disabled, loading, empty, and error states exist and render. An automated accessibility pass over those rendered states. And naming and file placement that follow the system's conventions. Failures go back to the agent, not onto a reviewer's plate. And the agent reports what it ran and what it fixed — that report is part of the evidence, because a clean self-check is the price of admission to human review.

Slide 5Evidence before opinion: the entry requirement

The entry requirement for human review is evidence, not enthusiasm. Every proposal arrives with screenshots at three fixed widths, every interaction state pictured — not just the default — the tokens it consumes and the props it exposes, an overlap statement naming the nearest existing components and why they fall short, and the agent's own self-check report. When the contributor is an agent, assembling this packet costs minutes, so there is no excuse for a thin one. And the rule is blunt: if the evidence is not attached, the review does not start. That is not bureaucracy. It is what keeps review grounded in what the component actually does, rather than in opinions about what it probably does.

Slide 6The review gate, end to end

Here is the whole gate in one picture. The agent proposes the change on a branch and assembles the evidence packet. Automated checks run first — and their failures go back to the agent, not to a person. What survives reaches taste review: density, hierarchy, restraint, and voice, judged against DESIGN.md, returned as findings with severity and evidence. Then the maintainer decides — accept, accept narrowed, redirect to an existing composition, or decline — and records the reasoning. Notice the dashed line back to the start. Recurring findings get written into rules and checks, so the next proposal meets a stricter gate. The gate is not there to slow agents down. It is there so production speed never outruns deliberateness.

Slide 7Intake when the contributor is an agent

What about proposals for entirely new components? They get a staged investigation, and agents can run most of it. Proposals describe the need, not the solution — a finished component is not an argument for itself. An overlap stage checks whether the need is already covered by an existing component, a composition, or a small extension; the cheapest component is the one you never build. An impact stage counts how many surfaces in the codebase actually show this need today, with file paths, not optimism. A spec stage drafts the API, tokens, states, and accessibility requirements. Then the packet goes to a named decision-maker. The research used to be the expensive part — agents make it cheap. The decision stays exactly as human as it always was.

Slide 8What a reviewable finding looks like

Here is what a taste finding looks like when it is written to be acted on. It carries a severity — this one is important, not a blocker. It names the dimension, density, and cites the line of DESIGN.md it rests on. It points at evidence: a specific screenshot at a specific width, with a measurable difference from every existing table. It states the user impact — forty per cent fewer rows on the standard viewport. And it proposes the smallest useful change, which is what the agent's revision pass will be scoped to. If a finding cannot fill those fields, it is probably a preference — and preferences are conversations, not gates.

Slide 9Feedback that stops recurring

The gate earns its cost by making itself less necessary. Every finding made twice is a candidate for promotion — the only question is where it goes. If it is checkable and applies everywhere, it becomes an automated check. If it is judged but stable — compact data surfaces, no decorative gradients — it becomes a line in DESIGN.md that reviewers can cite and agents will load. If it is procedural, it becomes a skill. If it only applies to certain file types, it becomes a path-scoped rule. And if it is still contested, it is not a rule yet — that one goes to the maintainer as a system decision. A review gate that never changes the rules is just a queue.

Slide 10Worked example: rejected, revised, accepted

Let's trace one component through the gate. An agent proposes a metric card for a dashboard team — branch, evidence packet, the lot. The automated checks catch two hardcoded hex values and a missing focus state; the agent fixes both and re-runs, no human involved. Taste review finds three things: density above the existing table pattern, a trend prop nothing justified, and an empty state that sounded like a template. Those go back as findings, the agent revises against exactly those three, and the maintainer accepts with a narrowed scope — trend display deferred, reasoning recorded. Then the loop closes: the density rule goes into DESIGN.md, the voice guidance into the review skill. Total human attention: one review and one decision.

Slide 11What the gate cannot decide

One caution before the exercise. Encode everything you can — and then be precise about what stays a human call. The gate cannot decide whether the system should grow into a new area at all; that is strategy. It cannot shift the system's taste; changing what counts as good is a decision you make deliberately, not something that accumulates out of individual reviews. It cannot settle naming or brand disputes, only document them. It cannot decide deprecations or breaking changes, because those spend other teams' time. And it cannot make a declined proposal feel fair — recorded reasoning and an offered alternative do that. Rules raise the floor. The ceiling stays yours.

Slide 12Exercise: write your review gate

Time to build yours. One page, for your own system. List your mechanical checks — and be honest about which ones are actually a script and which are a vigilant person. Define the evidence packet: the widths, the states, the token and prop listing, the overlap statement. Name the taste dimensions a reviewer judges, and the lines of DESIGN.md each one rests on — if a line is missing, that is homework. Name the decision-maker and their four outcomes. Then take the last three findings your team has repeated and decide where each one gets promoted. Keep the page. Module 4 will use it as the standard your audit measures the codebase against.

Slide 13Summary, and the bridge to audits

Let's close. Taste is not mystical — it is density, hierarchy, restraint, voice, and the call about whether to build at all. Mechanical checks run on the agent's side, before review, so humans never spend attention on what a script can catch. Evidence is the entry requirement, the maintainer owns the decision and records the reasoning, and recurring findings get promoted into checks, DESIGN.md, skills, and rules — so the gate gets cheaper every round. In Module 4 we turn the same standard outward: instead of gating new work, we audit what is already in the codebase, and we attach evidence to every finding. Bring your review gate page with you. See you there.