Slide 1 — Component Taste and Review Gates
Welcome back. In the first two modules we made the system legible — tokens that carry intent, and a DESIGN.md the agent reads every session. The payoff is speed: an agent can now produce a plausible component in minutes. This module is about the catch. Speed is easy; taste is the constraint. If every plausible component gets in, the system stops being a system. So we are going to separate the review work an agent can do on its own output from the judgment that has to stay human, and build a review gate that holds when the contributor is an agent. Let's start by pinning down what taste actually means.
Slide 2 — What taste means operationally
Taste gets treated as something mystical, which makes it impossible to manage. Operationally, it is four judgments you can name. Density: does the component ask the right amount of the screen for this product? Hierarchy: does it reveal information in the order the task needs? Restraint: does it add the minimum new surface area — or does it bring three new props and a gradient nobody asked for? And voice: do its labels and empty states sound like your product or like a template? You cannot fully automate any of these. But naming them matters, because a taste objection you cannot name is one the agent can never learn from — and one the next reviewer cannot apply consistently.
Slide 3 — Mechanical checks vs taste decisions
Here is the split that decides who does the work. Mechanical checks are everything a script can verify: tokens, types, contrast, state coverage, naming. They run the same way every time, and the agent should run them on its own output before any person looks. Taste decisions are the judgments from the last slide — density, hierarchy, restraint, voice — plus the biggest one: whether this component should exist at all. Those need a person. The two failure modes are worth remembering: mechanical rules go stale and pass things they should not, and taste review degrades into personal preference wearing a standards badge. The principle: never make a human catch what a script can catch, and never let a script block what only a human can judge.
Slide 4 — Mechanical checks the agent runs on itself
Here is the checklist the agent runs on its own output, before anyone reviews anything. A token audit — every colour, space, and type size resolves to a semantic token, no raw values. Type check, lint, and build — no invented props, nothing broken. State coverage — hover, focus, disabled, loading, empty, and error states exist and render. An automated accessibility pass over those rendered states. And naming and file placement that follow the system's conventions. Failures go back to the agent, not onto a reviewer's plate. And the agent reports what it ran and what it fixed — that report is part of the evidence, because a clean self-check is the price of admission to human review.
Slide 5 — Evidence before opinion: the entry requirement
The entry requirement for human review is evidence, not enthusiasm. Every proposal arrives with screenshots at three fixed widths, every interaction state pictured — not just the default — the tokens it consumes and the props it exposes, an overlap statement naming the nearest existing components and why they fall short, and the agent's own self-check report. When the contributor is an agent, assembling this packet costs minutes, so there is no excuse for a thin one. And the rule is blunt: if the evidence is not attached, the review does not start. That is not bureaucracy. It is what keeps review grounded in what the component actually does, rather than in opinions about what it probably does.
Slide 6 — The review gate, end to end
Here is the whole gate in one picture. The agent proposes the change on a branch and assembles the evidence packet. Automated checks run first — and their failures go back to the agent, not to a person. What survives reaches taste review: density, hierarchy, restraint, and voice, judged against DESIGN.md, returned as findings with severity and evidence. Then the maintainer decides — accept, accept narrowed, redirect to an existing composition, or decline — and records the reasoning. Notice the dashed line back to the start. Recurring findings get written into rules and checks, so the next proposal meets a stricter gate. The gate is not there to slow agents down. It is there so production speed never outruns deliberateness.
Slide 7 — Intake when the contributor is an agent
What about proposals for entirely new components? They get a staged investigation, and agents can run most of it. Proposals describe the need, not the solution — a finished component is not an argument for itself. An overlap stage checks whether the need is already covered by an existing component, a composition, or a small extension; the cheapest component is the one you never build. An impact stage counts how many surfaces in the codebase actually show this need today, with file paths, not optimism. A spec stage drafts the API, tokens, states, and accessibility requirements. Then the packet goes to a named decision-maker. The research used to be the expensive part — agents make it cheap. The decision stays exactly as human as it always was.
Slide 8 — What a reviewable finding looks like
Here is what a taste finding looks like when it is written to be acted on. It carries a severity — this one is important, not a blocker. It names the dimension, density, and cites the line of DESIGN.md it rests on. It points at evidence: a specific screenshot at a specific width, with a measurable difference from every existing table. It states the user impact — forty per cent fewer rows on the standard viewport. And it proposes the smallest useful change, which is what the agent's revision pass will be scoped to. If a finding cannot fill those fields, it is probably a preference — and preferences are conversations, not gates.
Slide 9 — Feedback that stops recurring
The gate earns its cost by making itself less necessary. Every finding made twice is a candidate for promotion — the only question is where it goes. If it is checkable and applies everywhere, it becomes an automated check. If it is judged but stable — compact data surfaces, no decorative gradients — it becomes a line in DESIGN.md that reviewers can cite and agents will load. If it is procedural, it becomes a skill. If it only applies to certain file types, it becomes a path-scoped rule. And if it is still contested, it is not a rule yet — that one goes to the maintainer as a system decision. A review gate that never changes the rules is just a queue.
Slide 10 — Worked example: rejected, revised, accepted
Let's trace one component through the gate. An agent proposes a metric card for a dashboard team — branch, evidence packet, the lot. The automated checks catch two hardcoded hex values and a missing focus state; the agent fixes both and re-runs, no human involved. Taste review finds three things: density above the existing table pattern, a trend prop nothing justified, and an empty state that sounded like a template. Those go back as findings, the agent revises against exactly those three, and the maintainer accepts with a narrowed scope — trend display deferred, reasoning recorded. Then the loop closes: the density rule goes into DESIGN.md, the voice guidance into the review skill. Total human attention: one review and one decision.
Slide 11 — What the gate cannot decide
One caution before the exercise. Encode everything you can — and then be precise about what stays a human call. The gate cannot decide whether the system should grow into a new area at all; that is strategy. It cannot shift the system's taste; changing what counts as good is a decision you make deliberately, not something that accumulates out of individual reviews. It cannot settle naming or brand disputes, only document them. It cannot decide deprecations or breaking changes, because those spend other teams' time. And it cannot make a declined proposal feel fair — recorded reasoning and an offered alternative do that. Rules raise the floor. The ceiling stays yours.
Slide 12 — Exercise: write your review gate
Time to build yours. One page, for your own system. List your mechanical checks — and be honest about which ones are actually a script and which are a vigilant person. Define the evidence packet: the widths, the states, the token and prop listing, the overlap statement. Name the taste dimensions a reviewer judges, and the lines of DESIGN.md each one rests on — if a line is missing, that is homework. Name the decision-maker and their four outcomes. Then take the last three findings your team has repeated and decide where each one gets promoted. Keep the page. Module 4 will use it as the standard your audit measures the codebase against.
Slide 13 — Summary, and the bridge to audits
Let's close. Taste is not mystical — it is density, hierarchy, restraint, voice, and the call about whether to build at all. Mechanical checks run on the agent's side, before review, so humans never spend attention on what a script can catch. Evidence is the entry requirement, the maintainer owns the decision and records the reasoning, and recurring findings get promoted into checks, DESIGN.md, skills, and rules — so the gate gets cheaper every round. In Module 4 we turn the same standard outward: instead of gating new work, we audit what is already in the codebase, and we attach evidence to every finding. Bring your review gate page with you. See you there.