Slide 1 — Critique is a loop, not a meeting
Welcome to Design Review and Critique with Agents. This first module is about the critique loop itself. In most teams, critique is a meeting — it happens when someone has time, it covers whatever got presented, and the quality of the feedback depends on who showed up. We are going to replace that with a loop: named dimensions that say what good means, an agent that inspects every state and screen against them, findings with evidence and severity, and a human triage gate where you decide what matters. The agent does the inspection. You keep the judgment. Let's start with why critique under-delivers today.
Slide 2 — Why critique under-delivers today
Why does critique under-deliver? Not because teams do not care — because it is expensive in exactly the resources they are short of. A proper review of one flow takes hours. The reviewers whose judgment matters most are the busiest people in the building. Critique covers whatever got presented this week, so the work that ships quietly through small changes never gets reviewed at all. Edge states and small screens rarely get looked at. And none of it is written down, so every session starts from zero. The result is that critique quality depends on who had spare attention that week. That is a staffing problem pretending to be a quality process.
Slide 3 — Named dimensions: what good means, written down
Here is the foundation of the whole loop: write down what good means before you ask for critique. If you ask an agent for general feedback you get general taste language — clean, modern, a bit inconsistent. If you name the dimensions, you get findings. These six are a strong default: task clarity, hierarchy, interaction states, trust and risk, system consistency, and accessibility. But they are not a universal checklist. A checkout needs trust and error recovery. A dashboard needs density and scan path. Choosing the dimensions that matter for this artifact and this user job is design work — and it stays yours. The agent applies them; it does not pick them.
Slide 4 — The agent as first reviewer
So what is the agent in this loop? A first reviewer, not a judge. Its strengths are exactly the things human review is short of: it is thorough — every screen, every state, every breakpoint, the same criteria every time. It is literal — it checks what the dimensions say. It is tireless — the third review this week is as careful as the first. And it is taste-free, which cuts both ways: it will never tell you which competent option is right for your brand, and you should not ask it to. That is why the critique pass is read-only — findings only, no edits. The agent owns inspection. You own judgment.
Slide 5 — The critique loop
Here is the loop the rest of this course builds on. You start with the artifact and its evidence packet — screenshots of states and breakpoints, the actual files, the copy, and the brief. The agent runs a read-only critique pass against your named dimensions and returns findings: severity, evidence, user impact, smallest fix. Then comes the part that stays human — the triage gate. You accept, reject, or defer each finding before anything gets fixed. The agent revises only what you approved, re-critiques with fresh screenshots, and you make the ship call. And notice the dashed line: findings that keep recurring get written into the harness, so the loop gets sharper every time you run it.
Slide 6 — Findings with evidence, not opinions
Here is what a finding should look like. Severity first — this one is important, not a blocker. Then evidence: at 390 pixels, the payment form appears before the plan summary, and desktop does not have this problem. That is a fact anyone can check in thirty seconds. Then user impact: a buyer might enter card details without seeing the final price. Then the smallest recommended fix — a compact plan summary above the form, not a redesign of the checkout. And an owner, because recommending the change and approving it are different jobs. Compare that to the payment step feels cramped. Same instinct, but one of these can be triaged and acted on. The other is a vibe.
Slide 7 — Severity keeps the findings list usable
Severity is what stops the findings list from becoming homework. Keep it to four levels. Blockers mean the user cannot finish the job or might make a serious mistake — they get fixed before anything else. Important means the user gets through, but trust, comprehension, or speed takes a hit. Polish is real but can wait. And question is the level teams forget: it is where the agent says, I found an ambiguity and I am not going to guess. You will adjust severities at the triage gate, and that is fine — the point is not a perfect taxonomy, it is a fast, defensible answer to what gets fixed first.
Slide 8 — Feedback the agent can act on vs decisions that stay human
The triage gate is really a routing decision. Some findings the agent can act on: a criterion was violated, the evidence is right there, and the fix is mechanical — a missing focus state, a hardcoded colour, a label that truncates. Send those to the revision pass. Other findings are decisions: more than one competent answer exists, and choosing needs context that never appears in the files — brand, history, politics, strategy. Those stay with you. Mis-route in either direction and you pay for it. Humans doing mechanical fixes wastes the loop. Agents making judgment calls outsources taste, one plausible finding at a time. Tag every accepted finding as a fix or a decision, and route it accordingly.
Slide 9 — The critique contract
All of this gets packaged into what the article behind this module calls a critique contract. It tells the agent the user job, the evidence to inspect, the dimensions to check, and the exact format to return findings in. And it says what is off-limits: do not redesign the page, do not write production code, do not invent brand rules. That last part matters more than it looks — without it, agents drift into proposing new directions and burying the two findings you needed under polite commentary. Keep the contract in the repository, not in someone's chat history, and keep critique and revision as separate prompts. That separation is what makes your triage gate real.
Slide 10 — Critique cadence: per artifact, per sprint, per release
Once the inspection is cheap, cadence becomes a real choice. Per artifact: every new screen or substantial revision gets a loop before it merges. Per sprint: a sweep across whatever changed, looking for drift — the inconsistencies that creep in even when each change was reviewed on its own. Per release: the high-stakes flows get the deep pass — checkout, onboarding, billing. Same contract every time; what changes is scope and depth. One warning. The bottleneck has moved. It is no longer the inspection, it is your triage attention. Pick cadences you can actually keep up with, because findings nobody reads are worse than no findings at all.
Slide 11 — Worked example: one screen through a full loop
Let's trace one screen through the whole loop — the checkout payment step from the article behind this module. The agent got the user job, screenshots of every state on desktop and mobile, the route files, and the design harness, and was asked for findings only. It came back with nine: one blocker — the error state had no recovery path — three important, four polish, and one genuine question about where the renewal date had to appear. At the triage gate the designer accepted three findings, deferred one, rejected two as taste, and answered the question. The agent fixed only what was approved, re-critiqued with fresh screenshots, and the designer shipped. Under an hour of human attention, and a written trail of what was checked and why.
Slide 12 — Exercise: write the critique dimensions for your product
Time to make this yours. Pick one real artifact — a screen, a flow, or a recent PR that touched the interface. Write the user job in a sentence or two. Then choose five or six dimensions that actually matter for this artifact, and for each one, write what a violation would look like — a checkable fact, not a restatement of the dimension. List the evidence the agent would need: which screenshots, which states, which files. And mark which likely findings would be mechanical fixes and which would be judgment calls. Don't run it yet. Keep the page — in Module 2, these dimensions become the basis of a heuristic evaluation across your whole product.
Slide 13 — Summary, and the bridge to evaluation at scale
Let's close the module. Critique under-delivers today because it is expensive in time, seniority, and coverage — and because nothing gets written down. The fix is structural: name the dimensions that define good for this artifact, let the agent run the inspection — thorough, literal, read-only — and have it return findings with severity, evidence, and the smallest fix. You hold the triage gate: fixes go to the agent, decisions stay with you, and recurring findings get written into the harness. Cadence is yours to choose, limited by your triage attention. In Module 2 we take this structure to scale — heuristic evaluations and cognitive walkthroughs across an entire product, and the triage discipline that keeps it workable. See you there.