Slide 1 — The Design Harness
Welcome to Module 4. In Module 3 you learned to brief an agent properly — and if you have been practising, you have probably noticed the problem: your briefs keep growing, because you keep re-typing the same facts. The spacing scale. The component library. Do not invent colours. That repetition is the signal that something belongs in the harness — the durable layer of files that loads with every session or on demand, so the brief only carries what is new. In this module we build that layer: the instruction hierarchy, the project instruction file, path-scoped rules, skills, tokens, and DESIGN.md. And we walk through a real one, file by file.
Slide 2 — The harness is the design system for agent behaviour
Here is the simplest way to think about the harness: it is the design system for agent behaviour. A design system stops people from re-deciding the button radius on every screen. The harness stops the agent from re-deciding it in every session. The brief carries what is new about this task; the harness carries what is true of every task. And the test for what belongs there is wonderfully boring: have you typed this correction more than once? If yes, encode it. If no, leave it in the brief. The harness will not give the agent taste — nothing will. What it does is remove the preventable failures, so your review time goes to the judgment calls that actually need you.
Slide 3 — The instruction hierarchy: what loads when
This diagram is the mental model for the whole module. Five layers, ordered by when they enter the agent's context. Global and user instructions load every session — keep team rules out of them. The project instruction file, CLAUDE.md or AGENTS.md, also loads in full every session, which is exactly why it has to stay short. Path-scoped rules load only when the agent touches matching files. Skills load their body only when a task needs them. And the brief at the bottom is the only thing you write per task. The whole design problem is putting each rule at the right layer — and the most common mistake is stuffing everything into the always-loaded file, where the important rules drown.
Slide 4 — CLAUDE.md and AGENTS.md: facts, not procedures
Here is what a good project instruction file looks like — and notice how boring it is. Stack. Where the design source of truth lives. The spacing scale, with the actual numbers. A type floor. Named tokens. The prohibitions — no gradients, no nested cards, no invented colours. And the commands to run before claiming the work is done. Every line is either a value the agent can copy or a behaviour someone can check. There is not a single adjective asking for a feeling. Keep it to about a page, because every line loads in every session. And if you maintain both a CLAUDE.md and an AGENTS.md, make one of them point to the other — duplicated files drift, and the agent will follow the stale one confidently.
Slide 5 — Path-scoped rules: constraints that follow the file type
Not every rule deserves to load in every session. Rules about CSS custom properties are noise when the agent is editing a Markdown file. Path-scoped rules fix that: a small instruction file with a glob pattern in its frontmatter, loaded only when the agent touches matching files. So the no-raw-hex rule, the spacing-scale rule, the focus-state rule — they show up exactly when styling files are being edited, and cost nothing the rest of the time. They also make review sharper, because you can point at the specific rule that was violated. One warning: this is a Claude Code feature. If your team mixes tools, keep the constraint in the shared AGENTS.md too, and treat the scoped rule as an optimisation.
Slide 6 — SKILL.md: procedures the agent loads on demand
The third layer is skills. A skill is just a folder with a SKILL.md file in it: a name and description that are always visible, and a body — the actual procedure — that only loads when a task matches. That makes skills the right home for everything long: review rubrics, prototype workflows, audit checklists. This design-review skill is a real pattern: capture the implementation, check hierarchy, spacing, tokens, states, accessibility, report findings by priority, and — importantly — do not redesign. Two practical notes. Write the description with the words people actually type, or the skill will never fire. And do not paste the design system into the skill; point it at DESIGN.md and the tokens, so it reads current values instead of stale copies.
Slide 7 — Design tokens are agent instructions
Design tokens are usually discussed as a build pipeline. For agent work, think of them as instructions. A primitive palette tells the agent what values exist — but it still has to guess when each one belongs, and its guesses are generic. Semantic tokens carry the decision: this colour means escalated, this spacing means page padding. Component tokens carry the local rule: status chips stay compact. The generated CSS variables are what the app consumes, and an audit can verify that components only ever reference them — which turns your colour rules into something falsifiable. Keep the layers separate: token JSON is the data, CSS variables are the runtime, and DESIGN.md — next slide — is the explanation.
Slide 8 — DESIGN.md: the source of truth both audiences can read
DESIGN.md is the explanation layer — the file that tells both a new teammate and an agent how this product should look. This excerpt is from the real one behind this site. Notice the pattern: a one-line mood, then immediately the concrete decisions that express it. Exact OKLCH values mapped to the runtime variables. A radius ceiling. And the anti-slop rules — no gradients, no decorative blobs, no nested cards, no hardcoded colours. Those negatives do more work per line than anything else in the file, because agents inherit the web's most common habits and explicit prohibitions are what keep them out. The instruction file names DESIGN.md as the source of truth; the agent reads it when the task starts.
Slide 9 — Worked example: the harness behind this site
Let's look at a real harness — the one behind the site you are using right now. Five layers, all ordinary files. A thirty-eight line AGENTS.md. A DESIGN.md with exact tokens and anti-slop rules. Thirty-one skills that load on demand. A folder of page snapshots as taste references. And two commands that act as gates: a structural check and a token audit. The same brief run without this harness took four correction rounds and about fifty minutes, with a gradient hero and ten hardcoded colours. With the harness: two rounds, fifteen minutes, audit passed first try. And here is the honest part — both runs missed the same awkward title wrapping. The gates only catch what they encode. Human review stays in the loop.
Slide 10 — Auto memory, and what should never live only in a conversation
Some agents keep their own notes between sessions — auto memory. The agent records what it discovers about your project and reloads it next time. Useful, with one caveat: those notes are observations, not decisions. A workaround the agent noticed in March will be repeated confidently in June unless someone reviews the notes, promotes the real standards into the harness, and prunes the rest. The broader point matters on every platform: chat history is not a harness. Decisions made in conversation evaporate when the session ends, and the next person re-litigates them from scratch. If a correction matters beyond today, write it into a file someone can review and version. The conversation is where decisions get made. The harness is where they survive.
Slide 11 — Harness anti-patterns
Now the ways harnesses go wrong — and every one of these feels like diligence while it happens. Pasting the design system into the instruction file, so two copies drift apart. The five-hundred-line file that buries the rules that matter. Vibes instead of criteria — feel premium checks nothing; padding values and token names check themselves. Stale rules nobody owns, which are worse than no rules, because the agent follows them confidently. Encoding every single miss until the harness is too heavy to maintain — encode what recurs, leave the rest to review. And duplicated files per tool that disagree within a month. The thread through all of it: the harness is a maintained product, not a dumping ground. Give it an owner and a cadence.
Slide 12 — Exercise: draft the harness file for your own project
Time to build yours. Start with your correction history, not your imagination: list everything you have fixed in agent output more than once. Then write the one-page instruction file — stack, design source of truth, scales, components, prohibitions, and the commands to run before claiming done. Read it back and mark every line: can the agent copy it, or can a reviewer check it? If neither, rewrite it or cut it. Sketch one skill — a design review checklist is the highest-value first one. Then re-run the task you have been carrying since Module 1, with the file in place, and count the corrections you no longer have to type. Keep that before-and-after. It is how you will sell this to your team.
Slide 13 — Summary, and what comes next
Let's close. The harness is the durable layer: it carries everything that is true of every task, so your brief only carries what is new. The hierarchy is really a cost model — always-loaded files stay short and factual, path-scoped rules and skills defer their cost until they are needed. Instruction files hold facts and prohibitions that can be checked, never vibes. Tokens and DESIGN.md make the design system itself readable to the agent, with data, runtime, and explanation kept as separate layers. And the whole thing needs an owner, because a stale harness is worse than none. It raises the floor; it does not supply taste. In Module 5 we look at the materials underneath all of this — design-as-code formats, the platforms, the canvases, and how to choose a stack. See you there.