AAgentic Design School
Module 4 of 6
45–55 minutes

Agentic Design Fundamentals

The Design Harness

Briefs carry task-specific intent; the harness carries everything durable. This module covers the instruction hierarchy, project instruction files, path-scoped rules, skills, and design tokens as agent instructions — the layer that makes output consistent across sessions and people.

Duration45–55 minutes

Slides13 slides with notes and narration

Learning objectives

  • Explain the instruction hierarchy and what loads when in a typical agent session.
  • Write a CLAUDE.md or AGENTS.md that encodes design rules an agent reliably follows.
  • Use path-scoped rules and SKILL.md files for procedures that only some tasks need.
  • Treat design tokens and DESIGN.md as machine-readable design instructions.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

The Design Harness

Agentic Design Fundamentals · Module 4 of 6

  • The instruction hierarchy: what loads when, and what it costs
  • CLAUDE.md and AGENTS.md: the project instruction file
  • Path-scoped rules and skills: constraints and procedures on demand
  • Tokens and DESIGN.md as machine-readable design instructions
  • A real harness walked through, plus the anti-patterns that kill them

Module 3 made the brief carry intent for one task. This module builds the layer that carries everything durable, so the brief can stay short.

Slide notes

Position this module against the previous one before going anywhere else. Module 3 was about the brief: the per-task statement of intent, constraints, and review criteria. The recurring frustration people hit after Module 3 is that their briefs keep getting longer, because they keep restating the same facts — the spacing scale, the component library, the colours, the prohibitions. The harness is the answer to that frustration: it is where everything durable lives, so the brief only has to carry what is genuinely new about this task.

Name the deliverables of the module up front, because they are concrete files: a project instruction file (CLAUDE.md or AGENTS.md), optionally a few path-scoped rules, one or two skills, and a design-system source the agent can read — DESIGN.md and the token files. None of it is exotic infrastructure. It is a handful of Markdown files, some JSON, and a couple of commands.

Manage expectations honestly. The harness raises the floor of agent output and removes the preventable failures; it does not supply taste, and it costs a working session to set up plus an owner to keep it current. Both of those caveats get their own slides later in the module.

Narration for this slide

Welcome to Module 4. In Module 3 you learned to brief an agent properly — and if you have been practising, you have probably noticed the problem: your briefs keep growing, because you keep re-typing the same facts. The spacing scale. The component library. Do not invent colours. That repetition is the signal that something belongs in the harness — the durable layer of files that loads with every session or on demand, so the brief only carries what is new. In this module we build that layer: the instruction hierarchy, the project instruction file, path-scoped rules, skills, tokens, and DESIGN.md. And we walk through a real one, file by file.

Slide 2 of 1316:9

The harness is the design system for agent behaviour

A design system tells people how the product should look. A harness tells agents how to work on it — and the two overlap more than they differ.

  • The brief carries what is new about this task; the harness carries what is true of every task
  • Anything you have corrected twice in agent output is a candidate for the harness
  • A harness is files in the repository: instructions, rules, skills, tokens, checks
  • It encodes decisions already made, so they stop being remade badly each session
  • It does not supply judgment — it removes the preventable failures so review can focus on judgment

The test for what goes in the harness: have I typed this correction more than once? If yes, encode it. If no, leave it in the brief.

Slide notes

The framing that lands best with designers is the parallel to a design system. A design system exists because you do not want every designer re-deciding the button radius on every screen; the harness exists because you do not want every agent session re-deciding it either. The difference is the audience: a design system is written for people and tolerates ambiguity, while a harness is written for an agent and has to be checkable. 'Feel premium' works in a brand deck; it does nothing in a CLAUDE.md.

The second bullet is the practical heuristic and worth repeating until it sticks: audit your own corrections. The facts you have re-typed across sessions — token locations, spacing scales, component names, the no-gradients rule — are the harness writing itself. This is also why the harness should be built incrementally rather than as a big upfront project: grow it from real failures, not from imagined ones.

Close with the honest boundary. The harness encodes decisions that have already been made. It does not decide what a screen is for, whether the hierarchy serves the user, or whether the work is good enough to ship. Those stay with the designer, and the point of automating everything below them is precisely to give review time back to them.

Narration for this slide

Here is the simplest way to think about the harness: it is the design system for agent behaviour. A design system stops people from re-deciding the button radius on every screen. The harness stops the agent from re-deciding it in every session. The brief carries what is new about this task; the harness carries what is true of every task. And the test for what belongs there is wonderfully boring: have you typed this correction more than once? If yes, encode it. If no, leave it in the brief. The harness will not give the agent taste — nothing will. What it does is remove the preventable failures, so your review time goes to the judgment calls that actually need you.

Slide 3 of 1316:9

The instruction hierarchy: what loads when

Five layers, ordered by when each one enters the agent's context. The cost model is the whole design problem.

Stacked layer diagram of the agent instruction hierarchy. From top to bottom: global and user-level instructions such as a home-directory CLAUDE.md, loaded in full every session and reserved for personal preferences; the project instruction file, CLAUDE.md or AGENTS.md at the repository root, loaded in full every session and holding facts like stack, token locations, scales, and prohibitions; path-scoped rules in .claude/rules that load only when matching files are touched; skills in SKILL.md folders whose names are always visible but whose bodies load on demand for procedures and rubrics; and the task brief at the bottom, typed each session and carrying only what is new about the task. A left-hand arrow runs from broad and durable at the top to specific to this task at the bottom, and a footer notes that always-loaded layers should stay short while procedures move into skills.
Layers 1–2 load in full every session, so they hold short, durable facts. Layers 3–4 cost context only when triggered, so file-specific constraints and procedures live there. The brief at the bottom is the only layer written per task — the harness exists so it can stay short.

Designers over-invest in the always-loaded layer because it is the most visible. The leverage is in knowing which layer each rule belongs to.

Slide notes

Walk the diagram top to bottom and keep returning to the one dimension that matters: when the content enters the model's context window. Global and user-level instructions load every session and should hold only personal preferences — the moment team design rules end up in someone's home-directory file, the team has rules that only one machine knows about. The project instruction file also loads every session, in full, which is exactly why it has to stay short and factual. Path-scoped rules load only when the agent touches matching files, so a CSS rule costs nothing while the agent edits Markdown. Skills expose a name and description at all times but load their body only when the task matches. The brief is typed fresh every session.

Add the nuance about imports, because it surprises people: Claude Code lets a CLAUDE.md import other files with an @path reference, which is good for keeping one source of truth, but the imported content still loads in full every session. Splitting a long file into imports tidies the repository; it does not save a single token. The only mechanisms that genuinely defer cost are path-scoped rules and skills.

The most common mistake is the 500-line instruction file. It does not fail loudly — it just dilutes attention until the rules that matter get skimmed past. The fix is demotion: procedures move down into skills, file-specific constraints move into rules, and the always-loaded file shrinks back to a page of facts.

Narration for this slide

This diagram is the mental model for the whole module. Five layers, ordered by when they enter the agent's context. Global and user instructions load every session — keep team rules out of them. The project instruction file, CLAUDE.md or AGENTS.md, also loads in full every session, which is exactly why it has to stay short. Path-scoped rules load only when the agent touches matching files. Skills load their body only when a task needs them. And the brief at the bottom is the only thing you write per task. The whole design problem is putting each rule at the right layer — and the most common mistake is stuffing everything into the always-loaded file, where the important rules drown.

Slide 4 of 1316:9

CLAUDE.md and AGENTS.md: facts, not procedures

The project instruction file is a page of facts the agent should never have to ask about. Every line competes with the actual task for attention.

  • Stack, component library, and where the design source of truth lives
  • Scales and floors: spacing base, type scale, accessibility minimums
  • Hard prohibitions — the negatives that define your taste more than any mood board
  • Verification commands the agent must run before claiming the work is done
  • If you keep both files, one imports or points to the other — duplicates drift within weeks
A design-focused AGENTS.md that stays under a page
# AGENTS.md

## Project
Next.js 15, Tailwind CSS v4, shadcn/ui components in components/ui.
DESIGN.md is the design source of truth. Design tokens live in styles/tokens.css.
Never hard-code colors or font sizes.

## Visual standards
- Spacing: 4px base scale only (4, 8, 12, 16, 24, 32, 48, 64).
- Type: body 16px/1.5; small text never below 14px; max two weights per screen.
- Color: use semantic tokens (--surface, --ink, --accent). No new hues without approval.
- Density: this is a work tool. Prefer compact layouts over marketing-style hero sections.

## Accessibility floor
- WCAG 2.2 AA contrast. Visible focus states. Hit targets 44px minimum on touch.

## Prohibitions
- No decorative gradients, no nested cards, no emoji in UI copy, no invented colors.

## Checks
Run before declaring any UI task complete:
- npm run lint
- npm run verify

Each line is either a value the agent can copy or a behaviour a reviewer can check. Nothing in the file asks for a feeling.

Slide notes

Read a few lines of the example out loud and point at what makes them work: every one is checkable. Spacing resolves to a named scale, small text has a numeric floor, colours come from named tokens, and the prohibitions are concrete enough that a reviewer can point at the violated line. Compare that with the sentences these files usually contain — keep it clean and modern, follow our brand — which cannot be verified against an artifact and therefore change nothing.

Spend a moment on length and audience. The file is loaded into every session, so the working discipline is roughly a page. The real AGENTS.md behind this school's site is thirty-eight lines; the example here is in the same spirit. The names differ by tool — Claude Code reads CLAUDE.md, Codex CLI and OpenCode read AGENTS.md, Gemini CLI reads GEMINI.md — but the job is identical, and a one-line import or a symlink lets one file serve all of them. What you must not do is maintain two diverging copies; they drift apart within weeks, and an agent reading the stale one will follow abandoned rules with full confidence.

The Checks section is the line people skip and should not. Listing the verification commands in the instruction file is what lets the agent run them itself before reporting completion, which means obvious violations never reach your review. That theme — executable gates — comes back in Module 6; here it just needs to be present in the file.

Narration for this slide

Here is what a good project instruction file looks like — and notice how boring it is. Stack. Where the design source of truth lives. The spacing scale, with the actual numbers. A type floor. Named tokens. The prohibitions — no gradients, no nested cards, no invented colours. And the commands to run before claiming the work is done. Every line is either a value the agent can copy or a behaviour someone can check. There is not a single adjective asking for a feeling. Keep it to about a page, because every line loads in every session. And if you maintain both a CLAUDE.md and an AGENTS.md, make one of them point to the other — duplicated files drift, and the agent will follow the stale one confidently.

Slide 5 of 1316:9

Path-scoped rules: constraints that follow the file type

Some constraints only matter for some files. Path-scoped rules attach an instruction to a glob pattern, so it loads only when the agent touches matching files.

  • Live in .claude/rules/ as Markdown files with a paths field in the frontmatter
  • CSS token rules load when CSS is edited — and cost nothing the rest of the time
  • Stricter exactly where they apply, while keeping the main file short
  • A reviewer can point at the specific rule that was violated, not a vague preference
  • Claude Code only — on mixed-tool teams, mirror the constraint in AGENTS.md or a skill
.claude/rules/styles.md
---
paths:
  - "**/*.css"
  - "app/**/*.tsx"
---

# Styling rules

- Use CSS custom properties from styles/tokens.css. Never write raw hex values.
- Spacing utilities must resolve to the 4px scale. No arbitrary values like p-[13px].
- Every interactive element needs a visible :focus-visible state.
- Dark mode is handled by tokens; do not add manual dark: overrides for color.

Path scoping is the first genuine context saving in the hierarchy: the rule is strict where it applies and absent everywhere else.

Slide notes

The motivation is easy to make concrete: rules about CSS custom properties are pure noise when the agent is editing a Markdown article, and rules about alt text are noise when it is writing a build script. Path-scoped rules resolve that by tying a small instruction file to a glob pattern. The rule about hex values only enters context when the agent is actually touching styling files — which is also when it is most likely to be violated.

There is a quality-of-review benefit beyond the context saving. When a constraint lives in a named rule file, a reviewer rejecting agent output can cite the exact rule that was broken rather than appealing to a vague preference. That tightens both the agent's behaviour and the team's conversations about what the rules actually are.

Do not skip the portability warning, because it bites mixed-tool teams. Path scoping is a Claude Code feature; Codex CLI, OpenCode, and Gemini CLI will not read .claude/rules/. Constraints that must hold for everyone belong in AGENTS.md or in a skill, with path-scoped rules used as a Claude Code-specific optimisation layered on top. The rule of thumb: the portable core lives in the shared file, the per-tool refinements live in per-tool mechanisms.

Narration for this slide

Not every rule deserves to load in every session. Rules about CSS custom properties are noise when the agent is editing a Markdown file. Path-scoped rules fix that: a small instruction file with a glob pattern in its frontmatter, loaded only when the agent touches matching files. So the no-raw-hex rule, the spacing-scale rule, the focus-state rule — they show up exactly when styling files are being edited, and cost nothing the rest of the time. They also make review sharper, because you can point at the specific rule that was violated. One warning: this is a Claude Code feature. If your team mixes tools, keep the constraint in the shared AGENTS.md too, and treat the scoped rule as an optimisation.

Slide 6 of 1316:9

SKILL.md: procedures the agent loads on demand

A skill is a folder with a SKILL.md inside. The name and description are always visible; the procedure only enters context when a task matches.

  • The right home for anything long: review rubrics, prototype workflows, audit checklists
  • Near-zero context cost until the skill is actually used
  • An open format read by Claude Code, Codex CLI, OpenCode, and Gemini CLI
  • A vaguely described skill never fires — write descriptions with the trigger words people use
  • A skill should not duplicate the design system; it points at DESIGN.md and the tokens it needs
.claude/skills/design-review/SKILL.md
---
name: design-review
description: Review a screen or component against the project design standards and produce
  a prioritized findings list. Use when asked to review, critique, or QA UI work.
---

# Design review

## Procedure
1. Capture the implementation (screenshot or running route) and the reference if one exists.
2. Check, in order: layout and hierarchy, spacing scale, type scale, color token usage,
   interaction states, accessibility (contrast, focus, labels, hit targets), responsive behavior.
3. Report findings as P0 (blocks ship), P1 (fix before review), P2 (polish), P3 (note).
4. For every finding, name the rule it violates and the smallest change that fixes it.
5. Do not redesign. Reviews produce findings, not new layouts.

Configuration is long-term memory, skills are the reference library. Procedures belong in the library, not in the file that loads every session.

Slide notes

Skills are where most of the genuinely long material in a harness should live, because they are close to free until used. The frontmatter name and description are always visible to the agent — that is how it knows the skill exists — but the body, which can be a full procedure with rubrics, supporting files, and scripts, only loads when the task matches. A design review checklist, a prototype workflow, a token audit procedure: all of these are skills, not paragraphs in CLAUDE.md.

The example on the slide is worth reading closely because of its last line: do not redesign, reviews produce findings. Skills are good at encoding not just steps but boundaries — what this procedure must not do — which is exactly the kind of instruction that gets lost when procedures live in chat history. The other detail to point out is the description: it includes the words people actually type — review, critique, QA — because a skill that is vaguely described simply never fires. That is the most common skill failure and it is silent.

On portability: the SKILL.md format has become an open standard read by all four major CLIs, with .agents/skills/ as the shared discovery folder for Codex, OpenCode, and Gemini CLI, and .claude/skills/ for Claude Code. Teams that mix tools usually keep one folder and symlink the other. Keep skills lean — workflow and rubric in the entry file, long references beside it — and let them point at DESIGN.md and the token files rather than pasting copies that will go stale.

Narration for this slide

The third layer is skills. A skill is just a folder with a SKILL.md file in it: a name and description that are always visible, and a body — the actual procedure — that only loads when a task matches. That makes skills the right home for everything long: review rubrics, prototype workflows, audit checklists. This design-review skill is a real pattern: capture the implementation, check hierarchy, spacing, tokens, states, accessibility, report findings by priority, and — importantly — do not redesign. Two practical notes. Write the description with the words people actually type, or the skill will never fire. And do not paste the design system into the skill; point it at DESIGN.md and the tokens, so it reads current values instead of stale copies.

Slide 7 of 1316:9

Design tokens are agent instructions

A token is a named design decision. The value matters; the name, type, and description are what the agent actually learns from.

LayerWhat it answersWhere it lives
Primitive tokensWhat raw values exist? blue-600, spacing-4, radius-2tokens/primitives.tokens.json
Semantic tokensWhat does this value mean here? color.status.escalated, space.pagetokens/semantic.tokens.json
Component tokensWhat does this component use in this state? statusChip.radiustokens/component.tokens.json
Runtime variablesWhat does the application actually consume? --color-action, --space-pagestyles/tokens.css (generated)
Explanation layerWhy, and what is forbidden — rules and anti-patterns in proseDESIGN.md

Primitives alone make the agent guess. Semantic and component tokens carry the decision, which is the part the agent cannot invent correctly.

Slide notes

The shift to make here is from tokens as a build artifact to tokens as instructions. A primitive palette — blue-500, gray-900, spacing-4 — tells the agent what values exist but not when each one belongs, so it still guesses, and its guesses regress to the most common patterns on the web. Semantic tokens carry the product meaning: color.status.escalated.foreground tells the agent this colour is for escalated work, not for decoration. Component tokens go one level further and encode local rules — a status chip radius that stays compact in dense tables. The DTCG format adds a description field on every token, and those descriptions are read by the agent; they are some of the cheapest design instruction you can write.

The layering also matters for verification. Source tokens are the decision; generated CSS variables are what the application consumes; and an audit can check that components only reference the runtime variables, never raw hex values. That is what makes the colour rules in the instruction file falsifiable — either the output uses semantic tokens or the audit fails — and it is the same audit you saw catch ten hardcoded colours in the Module 1 worked example.

Be clear about division of labour with the next slide: token JSON is the data layer, CSS variables are the runtime layer, and DESIGN.md is the explanation layer. Forcing one file to do all three jobs is how token systems become unreadable to both people and agents.

Narration for this slide

Design tokens are usually discussed as a build pipeline. For agent work, think of them as instructions. A primitive palette tells the agent what values exist — but it still has to guess when each one belongs, and its guesses are generic. Semantic tokens carry the decision: this colour means escalated, this spacing means page padding. Component tokens carry the local rule: status chips stay compact. The generated CSS variables are what the app consumes, and an audit can verify that components only ever reference them — which turns your colour rules into something falsifiable. Keep the layers separate: token JSON is the data, CSS variables are the runtime, and DESIGN.md — next slide — is the explanation.

Slide 8 of 1316:9

DESIGN.md: the source of truth both audiences can read

One Markdown file that explains the visual system in concrete terms — values an agent can copy, behaviours a reviewer can check, and the anti-patterns that define taste.

  • Identity in one or two lines, immediately followed by the concrete decisions that express it
  • Colour, type, spacing, radius, and motion as exact values mapped to the runtime tokens
  • A component inventory that maps every reusable surface to a real file in the repository
  • Explicit anti-pattern rules — the negatives prevent the failures the positives cannot
  • Referenced from the instruction file, read at task start — not pasted into every session
DESIGN.md (excerpt from this site's repository)
# Design System: Agentic Design School

## 1. Identity
- **Mood**: clear, instructional, bookish, field-tested

## 2. Color
Colors map to shadcn semantic variables in `app/globals.css`.
- **Primary**: `oklch(0.43 0.17 263)` deep school blue for CTAs and active navigation
- **Background**: `oklch(0.97 0.018 86)` warm paper base
- **Foreground**: `oklch(0.16 0.015 260)` high-contrast ink

## 5. Border Radius
- **Interactive**: 8px
- **Cards and Panels**: 8px maximum unless shadcn source component requires less

## 9. Layout
- **Anti-Slop Rules**: no purple gradients, no decorative blobs, no nested cards,
  no uniform card soup, no hardcoded brand colors in components

The mood line exists, but every line after it is a value to copy or a behaviour to check. That is what separates a DESIGN.md from a brand deck.

Slide notes

This excerpt is from the real DESIGN.md that produces the site the course lives on, and the thing to point at is the structure of each section: a short identity statement, immediately translated into exact values. The mood line — clear, instructional, bookish — is allowed to exist because it is followed by OKLCH values, a radius ceiling, and named components. A DESIGN.md that stops at the adjectives is just a brand deck with a different file extension, and it changes nothing about agent output.

The anti-slop section deserves special attention because it does the most work per line. Agents inherit the most common UI habits from the public web — gradients, card soup, oversized heroes — and explicit negatives are the only reliable way to keep those habits out of your product. Vague positives produce vague output; explicit negatives prevent specific failures. Every team's anti-pattern list is different, and writing it is a genuinely useful design exercise even before any agent reads it.

On placement in the hierarchy: DESIGN.md is not pasted into the always-loaded instruction file. The instruction file names it as the source of truth, and the agent reads it at task start or when a skill requires specific sections. There is also an emerging formal spec for the format — Google Labs published one, currently in alpha, and community collections gather examples — but the working pattern does not depend on the spec: concrete values, real component paths, explicit prohibitions, kept current.

Narration for this slide

DESIGN.md is the explanation layer — the file that tells both a new teammate and an agent how this product should look. This excerpt is from the real one behind this site. Notice the pattern: a one-line mood, then immediately the concrete decisions that express it. Exact OKLCH values mapped to the runtime variables. A radius ceiling. And the anti-slop rules — no gradients, no decorative blobs, no nested cards, no hardcoded colours. Those negatives do more work per line than anything else in the file, because agents inherit the web's most common habits and explicit prohibitions are what keep them out. The instruction file names DESIGN.md as the source of truth; the agent reads it when the task starts.

Slide 9 of 1316:9

Worked example: the harness behind this site

The repository that builds this course runs the full pattern. None of it is special infrastructure — Markdown, one JSON file, and two commands.

LayerFile in this repositoryWhen it loads
Project rulesAGENTS.md — 38 lines: stack, shadcn rules, verification commandsEvery session, in full
Design systemDESIGN.md + .agentic-designer.json — tokens, radius ceiling, anti-slop rules, component inventoryRead at task start
Task skills.codex/skills/<skill>/SKILL.md — 31 skills indexed by catalog.json, including a 5-dimension critiqueLoaded on demand
Examplesdesigns/ snapshots and craft notes — what good looks like hereInspected when referenced
Review gatesnpm run verify · npm run agentic:audit — route checks plus token and slop auditRun before handoff

Same brief, run with and without this harness: four iterations and ~50 minutes without it, two iterations and ~15 minutes with it — and both runs missed the same wrapping bug.

Slide notes

This is the most honest case study available, because it is the repository that builds the page the course is published on, and it is documented in full in the school's harness article. Walk the rows: a thirty-eight line AGENTS.md that names the stack and the verification commands; a DESIGN.md that carries the tokens, the radius ceiling, the anti-slop rules, and a component inventory mapped to real files; thirty-one skills indexed by a catalog so only the relevant one loads; a folder of owned page snapshots as taste references; and two executable gates wired into npm scripts.

Then give the comparison numbers and their caveats. The same small brief — a field-notes section, three cards — was run in a bare scratch project and in this repository. Without the harness: a gradient hero, hardcoded hex values, an invented call-to-action, ten token-audit violations, four correction rounds, roughly fifty minutes. With the harness: existing components reused, semantic tokens throughout, the audit passing first try, two rounds, roughly fifteen minutes. These are traced runs, not a benchmark, and the counts depend on the model and on what you accept as done.

Do not skip the shared miss, because it is the most instructive part. Both runs let long titles wrap awkwardly against the badge. The token audit passed it, the slop rules passed it, the skill checklist did not mention it; a human caught it in review. An automated gate only catches what it encodes — which is why the harness keeps a critique skill and a human review step above it, and why the right response to a miss is a deliberate decision about whether it recurs often enough to encode.

Narration for this slide

Let's look at a real harness — the one behind the site you are using right now. Five layers, all ordinary files. A thirty-eight line AGENTS.md. A DESIGN.md with exact tokens and anti-slop rules. Thirty-one skills that load on demand. A folder of page snapshots as taste references. And two commands that act as gates: a structural check and a token audit. The same brief run without this harness took four correction rounds and about fifty minutes, with a gradient hero and ten hardcoded colours. With the harness: two rounds, fifteen minutes, audit passed first try. And here is the honest part — both runs missed the same awkward title wrapping. The gates only catch what they encode. Human review stays in the loop.

Slide 10 of 1316:9

Auto memory, and what should never live only in a conversation

Some agents keep their own notes across sessions. Useful — but the agent's observations are not your team's standards.

  • Auto memory: the agent records project facts it discovers and reloads an index next session
  • It captures what the agent observed, not what the team decided — observations can fossilise mistakes
  • Review what accumulates: promote real standards into the harness, prune stale workarounds
  • Chat history is not a harness: decisions made in conversation are invisible to the next session and the next person
  • If a correction matters beyond today, it belongs in a file someone can review and version

The conversation is where decisions get made. The harness is where they go to survive. Anything that lives only in chat will be re-litigated next week.

Slide notes

Two related ideas share this slide. The first is auto memory as a mechanism: Claude Code, when the setting is enabled, writes its own notes about the project — where the token file lives, which component is canonical, what broke last time — and loads an index of them at session start, with detail files read on demand. For designers this is mostly free value, but it comes with a governance caveat: memory records what the agent observed, and observations fossilise. A workaround that was true in March will be repeated confidently in June unless someone reviews the notes, promotes the ones that deserve to be standards into the instruction file or a skill, and prunes the rest.

The second idea is broader and matters even on platforms with no memory feature at all: chat history is not a harness. Teams make real decisions inside agent conversations — we agreed the table stays dense, we agreed not to use that component — and then the session ends and the decision evaporates. The next session, or the next person, re-litigates it from scratch. The discipline is simple: if a correction or a decision matters beyond today, it gets written into a file that is versioned, reviewable, and visible to the whole team.

This is also the maintenance loop for the harness itself, and it connects back to the dashed feedback line in the Module 1 loop diagram: recurring critique is the signal that a rule is missing, and the end of a session is the cheapest moment to add it.

Narration for this slide

Some agents keep their own notes between sessions — auto memory. The agent records what it discovers about your project and reloads it next time. Useful, with one caveat: those notes are observations, not decisions. A workaround the agent noticed in March will be repeated confidently in June unless someone reviews the notes, promotes the real standards into the harness, and prunes the rest. The broader point matters on every platform: chat history is not a harness. Decisions made in conversation evaporate when the session ends, and the next person re-litigates them from scratch. If a correction matters beyond today, write it into a file someone can review and version. The conversation is where decisions get made. The harness is where they survive.

Slide 11 of 1316:9

Harness anti-patterns

Each of these feels like diligence while it quietly makes the harness worse.

  • Restating the design system in the instruction file — point at DESIGN.md and the tokens instead of pasting them
  • The 500-line always-loaded file — bloat does not fail loudly, it just buries the rules that matter
  • Vibes instead of acceptance criteria — "feel premium" cannot be checked; "section padding 64px+, two accent uses per viewport" can
  • Stale rules nobody owns — a harness that lags the product is worse than none, because the agent follows it confidently
  • Encoding every miss — over-encoding after one failure produces a harness nobody maintains; encode what recurs
  • Duplicated instruction files per tool — keep one source of truth and thin adapters, or they drift apart

The common thread: treating the harness as a dumping ground instead of a maintained product. It needs an owner and a review cadence, like the design system it serves.

Slide notes

Take these one at a time, because each maps to a failure you will see in real teams. Restating the design system in the instruction file feels thorough and doubles the maintenance burden: the moment a token changes, two files disagree, and the agent reads whichever one it loaded. The 500-line file is the most common one — it accumulates because adding a line always feels safer than deciding where the line belongs, and it degrades quietly as attention dilutes. Vibes instead of criteria is the writing failure: if two agents would interpret an instruction in opposite ways, it is not an instruction, it is a hope.

Stale rules deserve the strongest warning. Tokens move, components get renamed, the example screenshots stop matching the shipped product — and the agent keeps following the old truth with full confidence, because nothing in the file tells it the file is wrong. The fix is ownership and cadence: review the harness when the design system changes, not when someone notices drift.

The last two are about restraint and portability. Over-encoding after a single miss produces a harness so heavy nobody maintains it; the better response to a miss is a deliberate decision about whether it recurs. And duplicated per-tool instruction files — a CLAUDE.md here, an AGENTS.md there, each edited by different people — drift within weeks. One source of truth, with imports, symlinks, or pointers as the adapters.

Narration for this slide

Now the ways harnesses go wrong — and every one of these feels like diligence while it happens. Pasting the design system into the instruction file, so two copies drift apart. The five-hundred-line file that buries the rules that matter. Vibes instead of criteria — feel premium checks nothing; padding values and token names check themselves. Stale rules nobody owns, which are worse than no rules, because the agent follows them confidently. Encoding every single miss until the harness is too heavy to maintain — encode what recurs, leave the rest to review. And duplicated files per tool that disagree within a month. The thread through all of it: the harness is a maintained product, not a dumping ground. Give it an owner and a cadence.

Slide 12 of 1316:9

Exercise: draft the harness file for your own project

One page, drawn from your own correction history. Do not aim for complete — aim for the rules you have already paid for.

  • Audit your corrections: list everything you have fixed in agent output more than once over recent sessions
  • Write a one-page CLAUDE.md or AGENTS.md: stack, design source of truth, scales, component library, prohibitions, verification commands
  • Mark each line: is it a fact the agent can copy or a behaviour a reviewer can check? Rewrite or cut anything that is neither
  • Pick one recurring procedure — most teams start with a design review checklist — and sketch it as a SKILL.md
  • Note one rule that only applies to some files, and decide whether it becomes a path-scoped rule or stays in the shared file
  • Run the Module 1 task against the new file and count the corrections you no longer have to type

The before-and-after pair from the last step is the most persuasive artifact you will have when you propose this setup to your team.

Slide notes

The exercise is deliberately grounded in correction history rather than imagination. Asking people to write a harness from scratch produces speculative rules that never get triggered; asking them to audit what they have already corrected twice produces the rules that pay for themselves immediately. Most participants find the audit step uncomfortable in a useful way — it shows how much of their session time has gone into re-supplying the same facts.

The checkable-or-cut step is where the writing discipline gets built. Have participants read their own draft and mark each line: a value the agent can copy, a behaviour a reviewer can check, or neither. The neithers are usually brand adjectives, and the fix is the same translation exercise from Module 3 — turn the adjective into the observable behaviour that expresses it, or delete it.

The last step closes the loop with the running thread of the course: re-run the Module 1 task — the one participants sketched on paper and ran in Module 3 — with the new harness file in place, and count what changed. Set expectations honestly: a one-page file will not eliminate corrections, and the residual ones are usually judgment calls no file could carry. That residue is exactly what Module 6's critique and quality-gate material is for. Setup time for the whole exercise is roughly thirty to sixty minutes for a small project, and it is paid once.

Narration for this slide

Time to build yours. Start with your correction history, not your imagination: list everything you have fixed in agent output more than once. Then write the one-page instruction file — stack, design source of truth, scales, components, prohibitions, and the commands to run before claiming done. Read it back and mark every line: can the agent copy it, or can a reviewer check it? If neither, rewrite it or cut it. Sketch one skill — a design review checklist is the highest-value first one. Then re-run the task you have been carrying since Module 1, with the file in place, and count the corrections you no longer have to type. Keep that before-and-after. It is how you will sell this to your team.

Slide 13 of 1316:9

Summary, and what comes next

  • The harness carries everything durable, so the brief only carries what is new
  • The hierarchy is a cost model: always-loaded files stay short and factual; rules and skills defer cost until needed
  • Instruction files hold facts and prohibitions an agent can copy and a reviewer can check — never vibes
  • Tokens and DESIGN.md make the design system itself machine-readable: data, runtime, and explanation as separate layers
  • A harness needs an owner, a review cadence, and human review above its gates — it raises the floor, it does not supply taste

Module 5 turns to the materials and tools the harness operates on: design-as-code formats, the agent platforms, connected canvases, and how to choose a starting stack.

Slide notes

Recap by walking the layers one more time, but emphasise the connective logic rather than the file names: the hierarchy exists because context has a cost, the instruction file stays short because it pays that cost every session, rules and skills exist because most guidance is only sometimes relevant, and tokens plus DESIGN.md exist because the design system itself has to be readable by the agent for any of the rules to be checkable. If participants remember only one heuristic, it should be the demotion habit: when the always-loaded file grows, something in it wants to become a rule, a skill, or a pointer.

Restate the limits so nobody leaves overpromised. The harness removes preventable failures and cuts correction rounds — the worked example showed roughly half the iterations on a traced run — but it does not supply judgment about audience, hierarchy, or whether the work is good enough, and its gates only catch what they encode. It also costs a setup session and ongoing ownership; a stale harness is actively harmful.

Preview Module 5 concretely: it steps back from the harness to the materials it operates on — why diffable design files are the precondition for all of this, what the four agent platforms and the connected canvases actually offer, where Figma still fits, and how to pick a starting stack without betting the team on a moving target. Participants who did the exercise should bring their draft harness file; Module 5's stack decisions will determine which tool-specific adapters it needs.

Narration for this slide

Let's close. The harness is the durable layer: it carries everything that is true of every task, so your brief only carries what is new. The hierarchy is really a cost model — always-loaded files stay short and factual, path-scoped rules and skills defer their cost until they are needed. Instruction files hold facts and prohibitions that can be checked, never vibes. Tokens and DESIGN.md make the design system itself readable to the agent, with data, runtime, and explanation kept as separate layers. And the whole thing needs an owner, because a stale harness is worse than none. It raises the floor; it does not supply taste. In Module 5 we look at the materials underneath all of this — design-as-code formats, the platforms, the canvases, and how to choose a stack. See you there.

Module transcript
Module 4, narrated slide by slide

Slide 1The Design Harness

Welcome to Module 4. In Module 3 you learned to brief an agent properly — and if you have been practising, you have probably noticed the problem: your briefs keep growing, because you keep re-typing the same facts. The spacing scale. The component library. Do not invent colours. That repetition is the signal that something belongs in the harness — the durable layer of files that loads with every session or on demand, so the brief only carries what is new. In this module we build that layer: the instruction hierarchy, the project instruction file, path-scoped rules, skills, tokens, and DESIGN.md. And we walk through a real one, file by file.

Slide 2The harness is the design system for agent behaviour

Here is the simplest way to think about the harness: it is the design system for agent behaviour. A design system stops people from re-deciding the button radius on every screen. The harness stops the agent from re-deciding it in every session. The brief carries what is new about this task; the harness carries what is true of every task. And the test for what belongs there is wonderfully boring: have you typed this correction more than once? If yes, encode it. If no, leave it in the brief. The harness will not give the agent taste — nothing will. What it does is remove the preventable failures, so your review time goes to the judgment calls that actually need you.

Slide 3The instruction hierarchy: what loads when

This diagram is the mental model for the whole module. Five layers, ordered by when they enter the agent's context. Global and user instructions load every session — keep team rules out of them. The project instruction file, CLAUDE.md or AGENTS.md, also loads in full every session, which is exactly why it has to stay short. Path-scoped rules load only when the agent touches matching files. Skills load their body only when a task needs them. And the brief at the bottom is the only thing you write per task. The whole design problem is putting each rule at the right layer — and the most common mistake is stuffing everything into the always-loaded file, where the important rules drown.

Slide 4CLAUDE.md and AGENTS.md: facts, not procedures

Here is what a good project instruction file looks like — and notice how boring it is. Stack. Where the design source of truth lives. The spacing scale, with the actual numbers. A type floor. Named tokens. The prohibitions — no gradients, no nested cards, no invented colours. And the commands to run before claiming the work is done. Every line is either a value the agent can copy or a behaviour someone can check. There is not a single adjective asking for a feeling. Keep it to about a page, because every line loads in every session. And if you maintain both a CLAUDE.md and an AGENTS.md, make one of them point to the other — duplicated files drift, and the agent will follow the stale one confidently.

Slide 5Path-scoped rules: constraints that follow the file type

Not every rule deserves to load in every session. Rules about CSS custom properties are noise when the agent is editing a Markdown file. Path-scoped rules fix that: a small instruction file with a glob pattern in its frontmatter, loaded only when the agent touches matching files. So the no-raw-hex rule, the spacing-scale rule, the focus-state rule — they show up exactly when styling files are being edited, and cost nothing the rest of the time. They also make review sharper, because you can point at the specific rule that was violated. One warning: this is a Claude Code feature. If your team mixes tools, keep the constraint in the shared AGENTS.md too, and treat the scoped rule as an optimisation.

Slide 6SKILL.md: procedures the agent loads on demand

The third layer is skills. A skill is just a folder with a SKILL.md file in it: a name and description that are always visible, and a body — the actual procedure — that only loads when a task matches. That makes skills the right home for everything long: review rubrics, prototype workflows, audit checklists. This design-review skill is a real pattern: capture the implementation, check hierarchy, spacing, tokens, states, accessibility, report findings by priority, and — importantly — do not redesign. Two practical notes. Write the description with the words people actually type, or the skill will never fire. And do not paste the design system into the skill; point it at DESIGN.md and the tokens, so it reads current values instead of stale copies.

Slide 7Design tokens are agent instructions

Design tokens are usually discussed as a build pipeline. For agent work, think of them as instructions. A primitive palette tells the agent what values exist — but it still has to guess when each one belongs, and its guesses are generic. Semantic tokens carry the decision: this colour means escalated, this spacing means page padding. Component tokens carry the local rule: status chips stay compact. The generated CSS variables are what the app consumes, and an audit can verify that components only ever reference them — which turns your colour rules into something falsifiable. Keep the layers separate: token JSON is the data, CSS variables are the runtime, and DESIGN.md — next slide — is the explanation.

Slide 8DESIGN.md: the source of truth both audiences can read

DESIGN.md is the explanation layer — the file that tells both a new teammate and an agent how this product should look. This excerpt is from the real one behind this site. Notice the pattern: a one-line mood, then immediately the concrete decisions that express it. Exact OKLCH values mapped to the runtime variables. A radius ceiling. And the anti-slop rules — no gradients, no decorative blobs, no nested cards, no hardcoded colours. Those negatives do more work per line than anything else in the file, because agents inherit the web's most common habits and explicit prohibitions are what keep them out. The instruction file names DESIGN.md as the source of truth; the agent reads it when the task starts.

Slide 9Worked example: the harness behind this site

Let's look at a real harness — the one behind the site you are using right now. Five layers, all ordinary files. A thirty-eight line AGENTS.md. A DESIGN.md with exact tokens and anti-slop rules. Thirty-one skills that load on demand. A folder of page snapshots as taste references. And two commands that act as gates: a structural check and a token audit. The same brief run without this harness took four correction rounds and about fifty minutes, with a gradient hero and ten hardcoded colours. With the harness: two rounds, fifteen minutes, audit passed first try. And here is the honest part — both runs missed the same awkward title wrapping. The gates only catch what they encode. Human review stays in the loop.

Slide 10Auto memory, and what should never live only in a conversation

Some agents keep their own notes between sessions — auto memory. The agent records what it discovers about your project and reloads it next time. Useful, with one caveat: those notes are observations, not decisions. A workaround the agent noticed in March will be repeated confidently in June unless someone reviews the notes, promotes the real standards into the harness, and prunes the rest. The broader point matters on every platform: chat history is not a harness. Decisions made in conversation evaporate when the session ends, and the next person re-litigates them from scratch. If a correction matters beyond today, write it into a file someone can review and version. The conversation is where decisions get made. The harness is where they survive.

Slide 11Harness anti-patterns

Now the ways harnesses go wrong — and every one of these feels like diligence while it happens. Pasting the design system into the instruction file, so two copies drift apart. The five-hundred-line file that buries the rules that matter. Vibes instead of criteria — feel premium checks nothing; padding values and token names check themselves. Stale rules nobody owns, which are worse than no rules, because the agent follows them confidently. Encoding every single miss until the harness is too heavy to maintain — encode what recurs, leave the rest to review. And duplicated files per tool that disagree within a month. The thread through all of it: the harness is a maintained product, not a dumping ground. Give it an owner and a cadence.

Slide 12Exercise: draft the harness file for your own project

Time to build yours. Start with your correction history, not your imagination: list everything you have fixed in agent output more than once. Then write the one-page instruction file — stack, design source of truth, scales, components, prohibitions, and the commands to run before claiming done. Read it back and mark every line: can the agent copy it, or can a reviewer check it? If neither, rewrite it or cut it. Sketch one skill — a design review checklist is the highest-value first one. Then re-run the task you have been carrying since Module 1, with the file in place, and count the corrections you no longer have to type. Keep that before-and-after. It is how you will sell this to your team.

Slide 13Summary, and what comes next

Let's close. The harness is the durable layer: it carries everything that is true of every task, so your brief only carries what is new. The hierarchy is really a cost model — always-loaded files stay short and factual, path-scoped rules and skills defer their cost until they are needed. Instruction files hold facts and prohibitions that can be checked, never vibes. Tokens and DESIGN.md make the design system itself readable to the agent, with data, runtime, and explanation kept as separate layers. And the whole thing needs an owner, because a stale harness is worse than none. It raises the floor; it does not supply taste. In Module 5 we look at the materials underneath all of this — design-as-code formats, the platforms, the canvases, and how to choose a stack. See you there.