Slide 1 — Critique, Quality Gates, and Limits
Welcome to the final module of Agentic Design Fundamentals. Everything so far has been about getting a better first draft out of the agent — the brief, the harness, the right artifacts. This module is about what happens between that draft and the decision to ship. We will cover visual QA with real screenshot evidence, critique against named dimensions instead of vibes, the checks you can make fully automatic, and the failure patterns worth automating against. Then we get honest about what agents are still bad at, build your ship checklist, and close the loop on the exercise you started back in Module 1. Critique is where the quality comes from. Let's do it properly.
Slide 2 — Visual QA: screenshots as evidence, not decoration
Here is the problem visual QA solves. A generated page can compile, pass the type check, and still fail the design — the hierarchy is wrong, the density is off, the layout collapses on a phone. None of that shows up in a diff. So we give the review evidence: screenshots at the same fixed widths every run, the important states and not just the happy path, an accessibility report, and one paragraph reminding everyone what the page was for. Then one rule does most of the work: every finding has to name its evidence — which file, which viewport, which check. And the agent reports findings; a human approves the fixes. That order is not negotiable.
Slide 3 — Critique dimensions: named, not vibes
If you ask an agent whether a design is good, you get adjectives. If you ask it to inspect against named dimensions, you get findings. Here are the five we use as a baseline: hierarchy — is the information in the right order for the job; density — is it dashboard-tight or reading-loose, deliberately; tone — does it sound and look like this product, not the genre average; states — do empty, loading, error, and disabled actually exist; and accessibility — can keyboard and screen-reader users finish the task. Each one has evidence that settles it. Then two rules: severity goes by user impact, not by how easy the fix is, and judgment calls get routed to a human — the agent does not fix taste.
Slide 4 — Making checks executable
Here is the highest-leverage habit in this module: anything you have corrected twice should become a check that runs without you. Token audits catch raw colors and off-scale spacing. The type check catches invented component props before anyone reads the code. Verify scripts enforce naming and structure. An accessibility audit runs axe against the real rendered page. And critically, the agent runs all of these on its own output before you ever look. That does not replace your critique — a clean audit can still be a bad design. What it does is clear the mechanical failures out of the way, so your attention goes where only your attention works: judgment.
Slide 5 — Anti-slop rules: the failure patterns worth automating against
Let's name the failure everyone has seen: the purple gradient hero, the generic SaaS layout, emoji as bullet points, a low-contrast badge, a marketing band with a call-to-action your product does not have. We call it slop, but it is really just the most common answer on the public web showing up where your product needed a specific one. Which means it responds to rules. Name the patterns your team keeps rejecting and write them into the harness as explicit prohibitions. Then push whatever you can into executable checks — token audits, contrast assertions, lint rules. Slop survives on vagueness. Named patterns and automated gates starve it.
Slide 6 — Common failure modes, mapped to the gate that catches them
Here are the failure modes you will actually see, mapped to the gate that catches each one cheaply. Invented component props — caught by the type check the agent runs on itself. Generic patterns — caught by anti-slop rules and the tone critique. Quiet scope creep, where the agent improves things nobody asked about — caught at the plan review and by flagging any file outside the brief's scope. Token drift — caught by the audit script. Missing states — caught by the states row of your critique. Accessibility regressions — caught by axe plus a keyboard pass. Notice the pattern: mechanical failures go to automated gates, judgment failures go to critique, scope failures go to the plan review. Every recurring failure should have an address.
Slide 7 — The quality-gate pipeline
Here is the whole module in one picture. A generated draft enters the pipeline. First the executable checks — tokens, types, lint, accessibility — fully automated, run by the agent on its own output. Then visual QA: screenshots at fixed widths compared against intent, with findings a human reads. Then structured critique against the named dimensions — that gate is yours. Then the step most teams skip: the harness update, where any feedback you have given twice becomes a rule, so the next draft starts better. And finally the ship decision, which is always human, because passing every check is not the same as being the right thing to ship. Automated gates catch what they encode. You catch the rest.
Slide 8 — What agents are still bad at
Now the slide this course owes you: what agents are still bad at. They handle ambiguity badly — a vague brief gets filled with the most common answer, delivered confidently. They cannot read organisational politics, because none of it is in the files. They do not produce novel taste; they converge on the centre of the distribution, and a new direction has to come from you. They drift over long horizons — across a long session and between sessions — which is why the harness exists. And they do not know when to stop; they will keep producing plausible output long after it stopped improving. None of this is a bug that next year's model fixes. These limits are why the gates exist.
Slide 9 — The ship checklist
Here is the ship checklist — what gets verified before agent-produced work leaves the loop. The executable checks pass: tokens, types, lint, accessibility. Visual QA is done at the standard widths, states included, with no P0 or P1 still open. The critique ran against named dimensions, and every finding has evidence and a severity. Judgment calls are logged with a human owner, not quietly resolved by the agent. The diff stayed inside the brief's scope. Recurring feedback went into the harness. And a person decided to ship, knowing their name is on it. Seven items, deliberately short. If something on your version takes an hour to verify, it belongs in an earlier gate, not at the end.
Slide 10 — Keeping the harness alive
One habit separates teams whose agent output keeps getting better from teams stuck giving the same feedback forever: the harness update. The rule is simple — feedback you have given twice becomes a harness rule, or better, an executable check. Prefer checks over prose, because a rule the audit enforces never needs repeating in review. Write down approved exceptions too, or the next audit will try to fix something you did on purpose. And prune — stale rules erode trust in the file. Remember: a decision that lives only in a conversation is one the next session never heard. The harness is where critique goes to stop recurring.
Slide 11 — Course retrospective: the Module 1 task, run through the full pipeline
Time to close the loop you opened in Module 1. Take that original task — the one you sketched on paper, then ran in Module 3 — and push it through the full pipeline. Run the executable checks: tokens, types, accessibility. Capture screenshots at the three standard widths, including a state that is not the happy path. Critique it against the five dimensions, with evidence and severity for every finding. Edit this module's ship checklist into your own — change no more than three items. Add at least one recurring piece of feedback to your harness. Then put today's result next to the page you sketched in Module 1. That gap is the course, visible in your own work.
Slide 12 — Where to go next in the curriculum
So where do you go from here? Three courses continue the curriculum, and each goes deep on one part of what you have just learned. Claude Code for Designers is the platform-depth course — sessions, skills, MCP connections, the daily mechanics. Design Systems for Agents is about making your design system the machine-readable contract agents execute against — tokens, inventories, audits. And Agentic Prototyping is the speed course: from brief to testable artifact in a session, without touching production. Pick by your nearest bottleneck — tool fluency, system legibility, or speed to something testable. And whichever you choose, the fundamentals you built here come with you.
Slide 13 — Summary, and the close of the course
Let's close the module, and the course. This module's claim was simple: critique is where the quality comes from. Give the review evidence, name the dimensions, assign severity by user impact, and make every mechanical check executable so the agent runs it on itself. Name the failure patterns, give each one a gate, and when feedback recurs, write it into the harness so it stops recurring. Stay honest about the limits — ambiguity, politics, novel taste, long horizons — because those are exactly why the gates and the final decision stay human. Six modules ago this was vocabulary. Now it is a practice you can run on Monday. Thank you for taking the course — and pick your next one by your nearest bottleneck.