AAgentic Design School
Module 6 of 7
40–50 minutes

Claude Code for Designers

Automation and Maintenance

Turning one-off wins into repeatable runs: skills for recurring procedures, scheduled audits, asset production, and the maintenance work that keeps a design codebase healthy.

Duration40–50 minutes

Slides13 slides with notes and narration

Learning objectives

  • Identify recurring design chores worth automating and the ones that are not.
  • Write a SKILL.md that encodes a procedure the agent can repeat reliably.
  • Set up recurring audits and reports that surface drift before it compounds.
  • Keep automated runs reviewable instead of silently trusted.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1316:9

Automation and Maintenance

Claude Code for Designers · Module 6 of 7

  • The second time you do it, automate it
  • Skills: a procedure the agent loads on demand
  • Recurring audits, asset runs, and maintenance chores
  • Keeping every automated run reviewable

Module 5 made the agent useful on the design system once. This module makes that usefulness repeat without you re-typing the prompt.

Slide notes

Position this module as the shift from sessions to standing arrangements. Modules 3 to 5 were about doing a piece of design work well once: a brief, a plan, a build, a critique. Most of the value teams report after a few months is not from those one-off wins, it is from the chores that now run on their own — the token audit on every pull request, the documentation that regenerates itself, the migration that finally got done. That is what this module covers.

Be explicit about the risk that comes with the territory. Automation removes the human from the moment of execution, which is exactly when problems used to get noticed. Every mechanism in this module is therefore paired with a review point: hooks block rather than fix, scheduled runs report rather than merge, and batch runs work on branches with a reviewer in the loop. The phrase to plant early is reviewable, not trusted.

If the audience has not done Module 5, the examples will still make sense, but the design-system audit and component work from that module are the chores most of this module automates, so a quick recap of what a token audit is may be worth thirty seconds.

Narration for this slide

Welcome back. So far in this course, every win has been a session: you sat down, briefed the agent, reviewed a plan, shipped a change. This module is about the work that should not need a session at all. The token check that runs on every pull request. The documentation that updates itself when a component changes. The audit that runs monthly whether anyone remembers or not. The rule of thumb is simple: the second time you find yourself doing the same chore with the agent, stop and automate it. The catch is keeping all of that reviewable rather than silently trusted — and that is most of what we will talk about.

Slide 2 of 1316:9

What is worth automating

Three questions decide it: how often does it recur, how boring is it, and what does an error cost.

  • Frequency — runs weekly or on every change, not twice a year
  • Boredom — rule-based, low-judgment work people defer or rush
  • Error cost — cheap to catch and reverse when the run gets it wrong
  • Verifiability — there is a check that can tell pass from fail
  • Not worth it: one-offs, judgment calls, and anything you cannot review

Automate the chores where being wrong is cheap and being late is expensive. Keep the judgment calls as sessions.

Slide notes

The frequency test is the obvious one, but it filters out a surprising amount: the one-off migration to a new framework feels like an automation candidate because it is large, but it runs once, so it is better treated as a supervised batch run than as standing automation. The chores that pay back are the ones that recur on a rhythm — every edit, every pull request, every release, every month.

Boredom is a real signal, not a joke. Design-system maintenance work is mostly rule-based: regenerating prop tables, finding hardcoded values, exporting variants, keeping docs in sync. People defer this work precisely because it is tedious, and deferred maintenance is where drift comes from. Anthropic's own framing for Claude Code is to automate the work you keep putting off, and that maps almost exactly onto this category.

Error cost and verifiability are the safety half of the decision. A documentation regeneration that goes slightly wrong is caught at review and costs minutes. An automated change to the token source of truth that goes wrong propagates everywhere. The question to ask before automating anything: is there a check — a type check, an audit, a visual diff, a human reading a report — that can tell a good run from a bad one? If the answer is no, the chore is not ready to automate, however boring it is.

Narration for this slide

Not everything deserves automation, so start with three questions. How often does this recur? If it happens on every pull request or every month, automation pays back; if it happens once, just run it as a supervised session. How boring is it? Rule-based, low-judgment work — regenerating docs, finding hardcoded values, exporting variants — is exactly what people defer, and deferred maintenance is where drift comes from. And what does an error cost? Automate where a wrong run is cheap to catch and reverse. There is a fourth question hiding underneath: can a check tell a good run from a bad one? If nothing can verify the output, do not automate it yet.

Slide 3 of 1316:9

Skills: a procedure the agent loads on demand

A skill is a written procedure — a SKILL.md file in the repository — that turns a prompt you keep retyping into a command anyone can run.

  • Lives in .claude/skills/<name>/SKILL.md, versioned with the code
  • Frontmatter names it, describes it, and limits which tools it may use
  • The body is the procedure: numbered steps, exact paths, the report to produce
  • Invoked as a slash command — /document-component Button
  • Shared through git, so the whole team runs the same procedure

A good prompt helps you once. A skill is that prompt written down, scoped, and versioned — the difference between a trick and a tool.

Slide notes

The mental model to give designers: a skill is a recipe card the agent pulls out when asked. It is not a new capability and not a separate product — it is the prompt you refined over three sessions, written into a file with a name, an argument, and tool limits, so the fourth time costs nothing and produces the same shape of output as the third.

Walk the anatomy briefly. The frontmatter carries the name, a description (which is how the agent knows when the skill applies), an argument hint, and optionally an allowed-tools list that restricts what the skill can touch — a documentation skill has no business running arbitrary shell commands. The body is the procedure itself: read these files, extract this, follow this template, write the result here, report what you did. Exact paths beat vague instructions; the procedure should read like something a careful new colleague could follow.

The versioning point matters more for teams than the syntax does. Because skills live in the repository, they are reviewed like code, improved like code, and shared automatically with everyone who clones the project. When the documentation template changes, the skill changes in the same pull request. That is what turns one designer's clever prompt into a team capability — which is also the bridge into Module 7.

As of June 2026 the same idea appears under slightly different names across platforms — skills, saved prompts, slash commands — and the details of frontmatter fields shift between releases. Teach the pattern, and check the current Claude Code docs for the exact syntax before relying on a specific field.

Narration for this slide

Here is the core mechanism of this module: the skill. A skill is a procedure written into a file — SKILL.md — that lives in your repository under .claude/skills. The frontmatter gives it a name, a description, and limits on which tools it can use. The body is the procedure itself: read these files, follow this template, write the output here, report what you did. From then on it is a slash command — /document-component Button — and anyone on the team can run it. Think of it as the prompt you refined over three sessions, written down so the fourth run is free and the tenth run is identical. That is the difference between a trick and a tool.

Slide 4 of 1316:9

Anatomy of a skill: documenting a component

A real shape for a documentation skill. Exact paths, a template to follow, and a defined output — nothing left to improvise.

.claude/skills/document-component/SKILL.md
---
name: document-component
description: Generate or update docs for a design system component
argument-hint: [component-name]
allowed-tools: Read, Write, Glob, Grep
---

Generate documentation for the $ARGUMENTS component.

1. Read src/components/$ARGUMENTS/$ARGUMENTS.tsx
2. Read its stories and tests in the same folder
3. Extract the props interface: name, type, default, description
4. Follow docs/templates/component-doc-template.md exactly
5. Write to docs/components/$ARGUMENTS.md
6. Report which props lack descriptions — do not invent them

Every line removes a decision the agent would otherwise make differently each run. Consistency is the whole point.

Slide notes

Read the file top to bottom with the group, because each part answers a question designers actually ask. The argument-hint is what makes it reusable across the whole library: the same procedure documents Button today and DatePicker tomorrow. The allowed-tools line is the quiet safety feature — this skill can read and write files but cannot run shell commands or touch the network, which bounds what a bad run can do.

The numbered steps are deliberately rigid. Exact paths mean the skill fails loudly when the project structure changes instead of guessing. The template reference is what keeps fifty component docs looking like one set rather than fifty improvisations. And step six is the honesty rule: where the source lacks a prop description, the skill reports the gap rather than fabricating one, because plausible invented documentation is worse than a visible hole.

Point out what is not in the file: no judgment about whether the component is well designed, no licence to refactor it while passing through, no decision about what the docs should say beyond what the source says. Skills should encode procedures, not opinions. The opinions stay in the design documentation the skill reads — and in the human who reviews the output.

Narration for this slide

Let's read one. This is a documentation skill: pass it a component name and it reads the source, the stories, and the tests, extracts the props, follows the team's template, and writes the doc to a known location. Notice how rigid it is. Exact paths, an exact template, a defined output file. The allowed-tools line means it can read and write but cannot run commands. And the last step is the honesty rule: where a prop has no description in the source, report the gap — do not invent one. Every line here removes a decision the agent would otherwise make slightly differently on every run. Consistency is the entire point of writing it down.

Slide 5 of 1316:9

The maintenance automation map

Each recurring chore gets matched to the cheapest trigger that catches it — and every mechanism reports into a human review point.

Map of recurring design chores — hardcoded values, design QA on pull requests, drift audits, stale documentation, asset exports, token migrations, dependency bumps — routed to four automation mechanisms: hooks that run on every agent edit, CI checks that run on every pull request, scheduled runs on a weekly or monthly cadence, and background agents for batch work. All four mechanisms feed a single human review column where reports, diffs, and gate output are read before anything merges, and where deprecations, breaking changes, and taste decisions stay human.
Chores on the left are matched to the cheapest trigger that catches them: hooks for every edit, CI checks for every pull request, schedules for slow drift, background agents for batch work. Every path ends at human review — nothing merges on a green check alone.

The mechanism follows the rhythm of the chore. The review point does not move, whichever mechanism you pick.

Slide notes

Walk the map left to right. The left column is the human contribution: naming what actually recurs in this project. That list is different for every team, and writing it down is the first exercise of the module. The middle is the four mechanisms, organised by trigger rather than by technology: hooks fire on every agent edit and are deterministic — a hook that blocks hardcoded colours is a rule, not advice; CI checks fire on every pull request and cannot be skipped or forgotten, which is what makes them gates; scheduled runs fire on a calendar and exist for the slow drift no single change introduces; background agents and fan-out runs fire on demand for batch work — a migration across three hundred files, documentation for every component at once.

The matching principle is the cheapest trigger that catches the problem. Hardcoded values are best caught at the moment of writing, so they belong in a hook or a PR check, not a monthly audit. Meaning drift — a component that is technically token-compliant but no longer matches the documented intent — cannot be caught by any per-change check, so it belongs to the schedule. Asset production has no trigger at all until someone needs the assets, so it is on-demand batch work.

The right column is the part teams skip and regret. Every mechanism ends in evidence read by a person: a report of what changed and what was checked, diffs, gate output. And below the line are the calls that never get automated regardless of mechanism — deprecations, breaking changes, and the judgment about whether a variation is drift or evolution. The map is the module in one picture; the rest of the slides walk its columns.

Narration for this slide

Here is the whole module in one map. On the left, the chores that recur in a design codebase: hardcoded values creeping in, design QA on pull requests, drift, stale docs, asset exports, migrations. In the middle, four mechanisms organised by trigger. Hooks run on every edit and block problems deterministically. CI checks run on every pull request and cannot be forgotten. Scheduled runs catch the slow drift nothing else sees. Background agents handle batch work — a migration across hundreds of files, docs for every component. The matching rule: pick the cheapest trigger that catches the problem. And on the right, the part that does not move — every run ends in a report a human reads, and deprecations, breaking changes, and taste stay human.

Slide 6 of 1316:9

Choosing the mechanism: trigger, scope, and review

Same agent, four very different standing arrangements. The trigger and the review point are what you are actually choosing.

MechanismTriggerBest atReview point
HookEvery agent editBlocking hardcoded values before they landThe block itself, plus the diff at PR
Skill / saved commandWhen someone runs itRepeating a procedure identicallyThe run's report and output
CI check / PR gateEvery pull requestToken audit, type check, design QA on changed screensFindings comment read before merge
Scheduled runWeekly or monthlyDrift audits, doc regeneration, parity reportsFindings routed to a named owner
Background / fan-out runOn demand, batchMigrations and asset runs across many filesReviewer agent per diff, then the PR

If you cannot say where a human reads the result, you have not chosen a mechanism — you have chosen to stop looking.

Slide notes

The table looks like a technology choice but it is really two choices: when does this run, and where does a person look at what it did. Make the group answer the second question for each row, because that is the one that gets skipped. A hook's review point is partly the block itself — the bad change never lands — and partly the normal pull-request diff. A skill's review point is the report it produces at the end of each run, which is why every skill in this module ends with a report step. A CI gate's review point is the findings comment on the pull request, read by the author and whoever approves the merge. A scheduled run's review point is wherever its findings land — and findings that land in a channel nobody owns are the same as findings that were never generated, so route them to a named person. A batch run's review point is doubled: a reviewer agent on every diff during the run, and a human on the pull request at the end.

Note the overlap deliberately. The token discipline problem appears three times — hook, PR gate, scheduled audit — because the layers catch different things at different costs. A hook catches the value at the moment of writing. The PR gate catches what the hook's pattern missed. The scheduled audit catches what no per-change check can see. Layered cheap checks beat one heroic deep one.

As of June 2026, the exact names and surfaces for these mechanisms vary between Claude Code releases and between platforms — saved workflows, routines, scheduled tasks, GitHub Actions. The taxonomy by trigger is stable even where the feature names are not, so teach the column headings rather than the product nouns.

Narration for this slide

Here are the five standing arrangements side by side, and the two columns that matter most are trigger and review point. A hook runs on every edit and its review is partly the block itself. A skill runs when someone invokes it, and its review is the report it produces. A CI gate runs on every pull request and its review is the findings comment read before merge. A scheduled run fires weekly or monthly, and its findings must land with a named owner — findings nobody owns might as well not exist. And a batch run gets reviewed twice: a reviewer agent on every diff, then a human on the pull request. If you cannot point at where a person reads the result, you have not chosen a mechanism. You have chosen to stop looking.

Slide 7 of 1316:9

Recurring audits: tokens, accessibility, content, visuals

Two distinct defences against drift: per-change gates that catch violations the moment they appear, and a scheduled whole-surface audit for everything the gates cannot see.

  • Per-change gates: token audit, type check, axe-core and a keyboard pass, visual diff of changed screens
  • Scheduled audit: whole-surface sweep for meaning and behaviour drift, stale content, parity gaps
  • Gates are cheap and unambiguous; their limit is they only check what someone encoded as a rule
  • Audit findings get evidence, severity, and a named owner — not a backlog nobody reads
  • Know the blind spots: a passing token audit says nothing about contrast or legibility

Gates keep the baseline from eroding on every change. The scheduled audit catches what accumulates between changes. You need both.

Slide notes

Keep the two defences clearly separated, because conflating them weakens both. Per-change gates run on every pull request: the token audit that fails on raw colour values, the type check, automated accessibility checks plus a keyboard pass on changed routes, and a visual diff of the screens the branch touches. The school's design-QA-on-every-PR workflow is the worked version of this: scoped to the diff, ten to twenty minutes per pull request, findings posted as a comment with severity and evidence. Gates are cheap, unambiguous, and constant — and limited to what someone thought to encode as a rule.

The scheduled audit is the other defence and a genuinely different activity: a monthly or quarterly sweep that asks whether the system as built still matches the system as intended. It catches the drift no single change introduced — a component that is technically token-compliant but no longer matches the documented intent, content that quietly went stale, a web component whose mobile counterpart never got the new variants. Findings need evidence, a severity, and a named decision owner; an audit that feeds an unowned backlog is theatre.

Close with the blind-spot honesty from the school's maintenance article: in the executed case study there, the token audit caught a hardcoded hex value but missed the same mistake written as an arbitrary oklch class, and nothing in a token audit checks contrast at all. A passing gate means the checks you wrote found nothing. It does not mean nothing is wrong — which is one more reason the human review point never goes away.

Narration for this slide

Drift has two sources, so it needs two defences. The first is the per-change gate: on every pull request, run the token audit, the type check, automated accessibility checks, and a visual diff of the screens the branch actually changed. Cheap, constant, unambiguous — and limited to the rules someone wrote down. The second is the scheduled audit: monthly or quarterly, a whole-surface sweep asking whether the system as built still matches the system as intended. That is where you catch meaning drift, stale content, and platform gaps no single change caused. Give findings evidence, severity, and a named owner. And know your blind spots — a passing token audit says nothing about contrast. Green checks are not a design review.

Slide 8 of 1316:9

Asset production runs: exports, variants, documentation

Batch work where the procedure is identical per item and the item count is what made it painful. Fan the work out, keep the conventions in.

  • Documentation for every component, regenerated from source on a schedule or after changes
  • Component variants generated against the existing pattern — type, styles, story, test in one pass
  • Export sets: icons, illustrations, social and marketing crops at every required size
  • Fan-out: one agent per item, the same instructions for all, results collected into one report
  • The pattern reference is the quality control — point every run at the existing convention

The agent is not designing fifty things. It is applying one decision fifty times — which is exactly the work that never got staffed.

Slide notes

Asset production is the least glamorous and most reliably valuable category, because the procedure is identical per item and only the volume made it painful. The documentation case from the research is the cleanest example: extract the props interface, follow the template, write the file — repeated for every component in the library, either as a fan-out batch run or as a scheduled run that regenerates docs for whatever changed that week and opens the result as a pull request.

Variant generation works the same way provided the brief points hard at the existing pattern: read the current Button, its stories and tests, and add the destructive and outline variants following exactly the same structure — type union, style mapping, story, test. The phrase that does the quality control is following the exact same patterns; without it, each generated item is a small improvisation and the batch produces fifty slightly different answers. Export sets — icon sizes, image crops, favicon and social variants — are the same shape with even less judgment involved.

Two cautions. First, generated documentation still needs an owner who reads it; plausible documentation that is subtly wrong is worse than a gap, which is why these runs end in a pull request rather than a direct write to main. Second, fan-out is a multiplier on whatever instructions you give it — a flaw in the template or the pattern reference appears in every output at once. Review the first two or three items before letting the batch finish, the same spot-check habit the migration workflow uses.

Narration for this slide

The next category is asset production: work where the procedure is identical for every item and only the count made it painful. Documentation for every component, regenerated from source. Variants added to a component following the exact structure of the ones that exist — type, styles, story, test. Export sets at every required size. The pattern is fan-out: one agent per item, the same instructions for all of them, results collected into a single report. The quality control is the pattern reference — point every run at the existing convention so fifty outputs look like one set, not fifty improvisations. And spot-check the first few before the batch finishes, because fan-out multiplies mistakes exactly as efficiently as it multiplies good work.

Slide 9 of 1316:9

Maintenance chores: migrations, dead code, stale docs

The backlog items everyone agrees matter and nobody schedules. Batch runs with reviewer gates are how they finally get done.

  • Token migrations: a frozen mapping file, one agent per file, a reviewer agent on every diff
  • Dead code and unused exports: components shipped but never imported, flagged with evidence
  • Stale documentation: prose that still describes last quarter's values and components
  • Dependency bumps reviewed with a design QA pass on the screens they change
  • Always on a branch, always resumable, never merged by the run itself

These chores stalled because they were big, boring, and risky. The agent removes big and boring; the reviewer gate and the branch handle risky.

Slide notes

This is the long tail Module 5 mentioned and never had time for: the migration to semantic tokens that has been in the backlog for a year, the components that are exported but never imported, the DESIGN.md sections that describe values which no longer exist, the dependency bump nobody wants to be the one to merge. They stall for the same three reasons — big, boring, and risky — and the automation pattern addresses each one separately.

The token migration is the worked case worth describing because the school publishes it as a full workflow. The judgment happens before the run: a mapping file, frozen by humans, that records what every legacy value becomes and marks the ambiguous ones as ask. The run itself is a script that loops one migration agent per file with a reviewer agent on every diff — the migration agent is optimised to make changes, the reviewer to refuse them — and ends with a visual recapture comparing key screens against captures taken before the run. Three hundred files becomes a controllable half day, and the flags file becomes the design team's worklist rather than a pile of silent guesses.

The guardrails generalise to every chore on this slide: work on a branch, never main; track progress so the run is resumable; flag instead of guessing; and let the pull request, not the run, be the thing that merges. Dependency bumps get the same treatment with a twist — the design QA gate from the recurring-audits slide runs against the bump branch, so the team sees what the new component library version does to their screens before it ships rather than after.

Narration for this slide

Now the backlog: the chores everyone agrees matter and nobody schedules. The token migration that touches three hundred files. The components exported but never used. The documentation describing values that no longer exist. The dependency bump nobody wants to own. These stall because they are big, boring, and risky — and the pattern handles each part. Humans freeze the decisions first, like a token mapping file with the ambiguous cases marked ask. A script loops one agent per file, with a reviewer agent refusing any diff that goes beyond the mapping. A visual recapture at the end proves the screens did not change where they were not supposed to. Always on a branch, always resumable, and the run never merges itself.

Slide 10 of 1316:9

Keeping automation reviewable

The line between good and bad delegation is not how much the agent does. It is whether every run produces evidence a person actually reads.

  • Every run ends in a report: what changed, what was checked, what was left alone
  • Gates and audits attach their output — a claim of passing is not the same as the output
  • Findings and scheduled reports route to a named owner, not a channel
  • The run never merges its own work; a person reads the diff and the render
  • Approving on green checks without reading the diff is the human version of a gate blind spot
  • Deprecations, renames, breaking changes, and taste are decisions, not chores — they stay out of the automation entirely

Automation should make review cheap, not optional. The day nobody reads the reports is the day the system starts getting quietly worse.

Slide notes

This slide is the module's centre of gravity, so slow down. The failure mode of automated maintenance is rarely a dramatic breakage — hooks and gates catch most of those. It is the slow accumulation of technically compliant changes that nobody with taste has looked at in months. The system stays consistent and gets quietly worse. The defence is structural, not motivational: design every mechanism so that reading its output is cheap, and assign that reading to a person by name.

Walk the bullets as a checklist against any automation the team is about to set up. The report requirement is non-negotiable and cheap — every skill, scheduled run, and batch script in this module ends with report what you did, what you checked, and what you left alone, because the absence of changes is evidence too. Attaching gate output matters because a summary that says checks passed is a claim, not evidence. Named owners matter because findings posted to a channel decay into wallpaper within a month. And the no-self-merge rule is the bright line: the run can open the pull request, it cannot approve it.

The last bullet draws the boundary the school's maintenance article calls the almost: deprecations, renames, breaking changes, intentional variation, and taste are decision rights, not capability gaps. The agent can produce the impact analysis — every consumer of a token, every surface a breaking change touches — and that genuinely changes how confidently humans decide. It does not change who decides. Automation that quietly absorbs those calls has not saved anyone time; it has moved the design decisions to the entity with the least accountability for them.

Narration for this slide

Here is the part that decides whether any of this survives contact with reality. The line between good and bad delegation is not how much the agent does — it is whether every run produces evidence a person actually reads. So: every run ends in a report covering what changed, what was checked, and what was left alone. Gate output gets attached, because a claim of passing is not the output. Findings route to a named owner, not a channel. The run never merges its own work. And approving on green checks without reading the diff is just the human version of a gate blind spot. One boundary above all: deprecations, breaking changes, and taste are decisions, not chores. They stay out of the automation entirely.

Slide 11 of 1316:9

Worked example: one chore, turned into a skill, then scheduled

Component documentation drift, taken from a manual chore to a standing arrangement in three steps over three weeks.

StageWhat runsHuman timeWhat the human reviews
Week 1 — manual sessionA briefed run documents two changed components; prompt refined twiceAbout an hourBoth docs read line by line; template gaps fixed
Week 2 — saved as a skill/document-component runs the frozen procedure per componentMinutes per componentThe output doc and the skill's gap report
Week 3 — scheduled runMonday morning run finds components changed that week, regenerates their docs, opens a PROne PR review a weekThe diff, the gap report, and spot-checks against source
Steady stateDocs stop drifting between releases; gaps surface as flags, not surprisesRoughly 15 minutes a weekThe same PR — plus a quarterly read of the skill itself

Nothing got smarter between week one and week three. The procedure got written down, then given a trigger — and the review point never disappeared.

Slide notes

This trace is a representative composite of the documentation pattern in the book research and this school's workflows, run on a small component library — present it as the shape of the journey, not as a benchmark. The point of showing it as a table is the progression: the same procedure moves from a session to a skill to a schedule, and at every stage the question what does the human review has a concrete answer.

Week one is deliberately manual and deliberately slow. The first run is where the template gets fixed, the prop-extraction rules get sharpened, and the honesty rule — report missing descriptions, do not invent them — gets added after the agent helpfully fabricated one. Resist the urge to automate before this shakedown; a skill written from an unrefined prompt just repeats its flaws faster.

Week two freezes the procedure into a SKILL.md and the cost per component drops to minutes, most of which is the human reading the output. Week three adds the trigger: a scheduled run that checks which component files changed in the past week, regenerates only those docs, and opens a pull request. The deliberate choice is that the schedule produces a PR, not a merge — fifteen minutes of review a week is the standing price, and it is the cheapest line item in the whole arrangement. The steady-state row is the actual payoff: documentation drift stops being a quarterly cleanup and becomes a weekly non-event, and the gaps the skill reports become a worklist instead of a surprise. The quarterly read of the skill itself is the maintenance of the maintenance — procedures drift too.

Narration for this slide

Let's trace one chore end to end: component documentation that always drifts. Week one, run it manually — a briefed session documents two components, and the prompt gets refined twice, including adding the rule to report missing prop descriptions instead of inventing them. Week two, freeze that procedure into a skill, and the cost per component drops to minutes. Week three, add the trigger: a Monday run finds the components that changed that week, regenerates their docs, and opens a pull request. Steady state is about fifteen minutes of review a week, and docs that no longer drift between releases. Notice that nothing got smarter along the way. The procedure got written down, then given a trigger — and a human still reads every pull request.

Slide 12 of 1316:9

Exercise: draft a skill for your most repetitive design task

Pick the chore you have done at least twice with the agent and write the SKILL.md on paper. Do not worry about exact syntax — worry about the procedure.

  • Name the chore and check it against frequency, boredom, error cost, and verifiability
  • Write the frontmatter: name, one-sentence description, the argument it takes, the tools it needs
  • Write the procedure as numbered steps with exact paths and the template or pattern it must follow
  • End with the report: what it changed, what it checked, what it flagged instead of guessing
  • Decide its trigger — on demand, every PR, or scheduled — and name the person who reviews its output

If you cannot write the steps precisely enough for a careful new colleague to follow, the chore is not ready to be a skill yet — and that is a useful finding.

Slide notes

Like the exercises in earlier modules, this one is paper-first: the goal is the procedure, not the syntax, and the syntax changes between releases anyway. Steer people towards chores they have genuinely done at least twice with the agent — documentation regeneration, a token check, an export set, an audit of one flow. The most common mistake is picking something aspirational and large; the second most common is picking a judgment call, like deciding which components to deprecate, and discovering at step three that there is no procedure to write.

The frontmatter forces the scoping decisions: what argument does this take, and what is the smallest set of tools it needs. The numbered steps force the precision: exact paths, an explicit template or pattern reference, and the flag-do-not-guess rule wherever the source might be ambiguous. The report step and the named reviewer connect the exercise back to the reviewability slide — a skill without a report is a skill nobody can supervise.

If running this live, have two or three people read their steps aloud and ask the room a single question: could a careful new colleague follow this without asking anything? Where the answer is no, the gap is usually a missing path, an unstated convention, or a hidden judgment call — exactly the things that would have made the automated runs inconsistent. Participants who finish early can write the second half: the trigger, the review point, and what would convince them after a month that the skill is working.

Narration for this slide

Your turn. Pick the chore you have done at least twice with the agent — documentation, a token check, an export set — and draft its skill on paper. Run it past the four tests first: frequency, boredom, error cost, and whether anything can verify the output. Then write the frontmatter — name, description, argument, tools — and the procedure as numbered steps with exact paths and the template it must follow. End with the report: what it changed, what it checked, what it flagged rather than guessed. Finally, choose its trigger and name the person who reviews its output. If you cannot write steps a careful new colleague could follow, the chore is not ready to be a skill — and knowing that is the point of the exercise.

Slide 13 of 1316:9

Summary, and what comes next

  • Automate what recurs, bores, and can be verified — keep one-offs and judgment calls as sessions
  • Skills turn a refined prompt into a versioned, scoped procedure the whole team can run
  • Match each chore to the cheapest trigger: hooks, PR gates, schedules, or batch background runs
  • Layer the drift defences: per-change gates plus a scheduled whole-surface audit with named owners
  • Every run ends in evidence a person reads — automation makes review cheap, never optional

Module 7 connects Claude Code to the rest of the toolchain over MCP — canvases, browsers, tickets — and turns one designer's setup into a team workflow.

Slide notes

Recap by walking the bullets, but emphasise the through-line: nothing in this module made the agent more capable. Every gain came from writing a procedure down, choosing when it runs, and deciding who reads the result. That is process design, which is to say it is design — and it is why this module sits with the designer rather than with whoever owns the build pipeline.

Remind the group of the two artefacts they now have on paper: the chore inventory implied by the automation map and the draft skill from the exercise. The practical homework is to take the single strongest candidate and walk it through the three-week progression from the worked example — manual session, saved skill, scheduled run — rather than trying to automate the whole inventory at once. One standing arrangement that the team trusts is worth more than five that nobody reviews.

Preview Module 7 concretely: everything so far has treated Claude Code and the repository as the whole world. The next module connects it outward over MCP — design canvases, browsers and screenshots, tickets, analytics — and then deals with the harder half of scale: shared harness files, team conventions, cost, and a rollout that starts with a pilot rather than a mandate. The skills and scheduled runs built in this module are exactly the things a team shares, which is why this module comes first.

Narration for this slide

Let's close. Automate the chores that recur, bore, and can be verified — and keep the one-offs and the judgment calls as sessions. Skills turn your refined prompt into a versioned procedure anyone on the team can run. Match each chore to the cheapest trigger that catches it: hooks on every edit, gates on every pull request, schedules for slow drift, batch runs for the backlog. Layer the defences, and make sure every run ends in evidence a named person reads — automation makes review cheap, never optional. In Module 7 we connect Claude Code to the rest of your toolchain over MCP and make this whole approach work for a team, not just for you. See you there.

Module transcript
Module 6, narrated slide by slide

Slide 1Automation and Maintenance

Welcome back. So far in this course, every win has been a session: you sat down, briefed the agent, reviewed a plan, shipped a change. This module is about the work that should not need a session at all. The token check that runs on every pull request. The documentation that updates itself when a component changes. The audit that runs monthly whether anyone remembers or not. The rule of thumb is simple: the second time you find yourself doing the same chore with the agent, stop and automate it. The catch is keeping all of that reviewable rather than silently trusted — and that is most of what we will talk about.

Slide 2What is worth automating

Not everything deserves automation, so start with three questions. How often does this recur? If it happens on every pull request or every month, automation pays back; if it happens once, just run it as a supervised session. How boring is it? Rule-based, low-judgment work — regenerating docs, finding hardcoded values, exporting variants — is exactly what people defer, and deferred maintenance is where drift comes from. And what does an error cost? Automate where a wrong run is cheap to catch and reverse. There is a fourth question hiding underneath: can a check tell a good run from a bad one? If nothing can verify the output, do not automate it yet.

Slide 3Skills: a procedure the agent loads on demand

Here is the core mechanism of this module: the skill. A skill is a procedure written into a file — SKILL.md — that lives in your repository under .claude/skills. The frontmatter gives it a name, a description, and limits on which tools it can use. The body is the procedure itself: read these files, follow this template, write the output here, report what you did. From then on it is a slash command — /document-component Button — and anyone on the team can run it. Think of it as the prompt you refined over three sessions, written down so the fourth run is free and the tenth run is identical. That is the difference between a trick and a tool.

Slide 4Anatomy of a skill: documenting a component

Let's read one. This is a documentation skill: pass it a component name and it reads the source, the stories, and the tests, extracts the props, follows the team's template, and writes the doc to a known location. Notice how rigid it is. Exact paths, an exact template, a defined output file. The allowed-tools line means it can read and write but cannot run commands. And the last step is the honesty rule: where a prop has no description in the source, report the gap — do not invent one. Every line here removes a decision the agent would otherwise make slightly differently on every run. Consistency is the entire point of writing it down.

Slide 5The maintenance automation map

Here is the whole module in one map. On the left, the chores that recur in a design codebase: hardcoded values creeping in, design QA on pull requests, drift, stale docs, asset exports, migrations. In the middle, four mechanisms organised by trigger. Hooks run on every edit and block problems deterministically. CI checks run on every pull request and cannot be forgotten. Scheduled runs catch the slow drift nothing else sees. Background agents handle batch work — a migration across hundreds of files, docs for every component. The matching rule: pick the cheapest trigger that catches the problem. And on the right, the part that does not move — every run ends in a report a human reads, and deprecations, breaking changes, and taste stay human.

Slide 6Choosing the mechanism: trigger, scope, and review

Here are the five standing arrangements side by side, and the two columns that matter most are trigger and review point. A hook runs on every edit and its review is partly the block itself. A skill runs when someone invokes it, and its review is the report it produces. A CI gate runs on every pull request and its review is the findings comment read before merge. A scheduled run fires weekly or monthly, and its findings must land with a named owner — findings nobody owns might as well not exist. And a batch run gets reviewed twice: a reviewer agent on every diff, then a human on the pull request. If you cannot point at where a person reads the result, you have not chosen a mechanism. You have chosen to stop looking.

Slide 7Recurring audits: tokens, accessibility, content, visuals

Drift has two sources, so it needs two defences. The first is the per-change gate: on every pull request, run the token audit, the type check, automated accessibility checks, and a visual diff of the screens the branch actually changed. Cheap, constant, unambiguous — and limited to the rules someone wrote down. The second is the scheduled audit: monthly or quarterly, a whole-surface sweep asking whether the system as built still matches the system as intended. That is where you catch meaning drift, stale content, and platform gaps no single change caused. Give findings evidence, severity, and a named owner. And know your blind spots — a passing token audit says nothing about contrast. Green checks are not a design review.

Slide 8Asset production runs: exports, variants, documentation

The next category is asset production: work where the procedure is identical for every item and only the count made it painful. Documentation for every component, regenerated from source. Variants added to a component following the exact structure of the ones that exist — type, styles, story, test. Export sets at every required size. The pattern is fan-out: one agent per item, the same instructions for all of them, results collected into a single report. The quality control is the pattern reference — point every run at the existing convention so fifty outputs look like one set, not fifty improvisations. And spot-check the first few before the batch finishes, because fan-out multiplies mistakes exactly as efficiently as it multiplies good work.

Slide 9Maintenance chores: migrations, dead code, stale docs

Now the backlog: the chores everyone agrees matter and nobody schedules. The token migration that touches three hundred files. The components exported but never used. The documentation describing values that no longer exist. The dependency bump nobody wants to own. These stall because they are big, boring, and risky — and the pattern handles each part. Humans freeze the decisions first, like a token mapping file with the ambiguous cases marked ask. A script loops one agent per file, with a reviewer agent refusing any diff that goes beyond the mapping. A visual recapture at the end proves the screens did not change where they were not supposed to. Always on a branch, always resumable, and the run never merges itself.

Slide 10Keeping automation reviewable

Here is the part that decides whether any of this survives contact with reality. The line between good and bad delegation is not how much the agent does — it is whether every run produces evidence a person actually reads. So: every run ends in a report covering what changed, what was checked, and what was left alone. Gate output gets attached, because a claim of passing is not the output. Findings route to a named owner, not a channel. The run never merges its own work. And approving on green checks without reading the diff is just the human version of a gate blind spot. One boundary above all: deprecations, breaking changes, and taste are decisions, not chores. They stay out of the automation entirely.

Slide 11Worked example: one chore, turned into a skill, then scheduled

Let's trace one chore end to end: component documentation that always drifts. Week one, run it manually — a briefed session documents two components, and the prompt gets refined twice, including adding the rule to report missing prop descriptions instead of inventing them. Week two, freeze that procedure into a skill, and the cost per component drops to minutes. Week three, add the trigger: a Monday run finds the components that changed that week, regenerates their docs, and opens a pull request. Steady state is about fifteen minutes of review a week, and docs that no longer drift between releases. Notice that nothing got smarter along the way. The procedure got written down, then given a trigger — and a human still reads every pull request.

Slide 12Exercise: draft a skill for your most repetitive design task

Your turn. Pick the chore you have done at least twice with the agent — documentation, a token check, an export set — and draft its skill on paper. Run it past the four tests first: frequency, boredom, error cost, and whether anything can verify the output. Then write the frontmatter — name, description, argument, tools — and the procedure as numbered steps with exact paths and the template it must follow. End with the report: what it changed, what it checked, what it flagged rather than guessed. Finally, choose its trigger and name the person who reviews its output. If you cannot write steps a careful new colleague could follow, the chore is not ready to be a skill — and knowing that is the point of the exercise.

Slide 13Summary, and what comes next

Let's close. Automate the chores that recur, bore, and can be verified — and keep the one-offs and the judgment calls as sessions. Skills turn your refined prompt into a versioned procedure anyone on the team can run. Match each chore to the cheapest trigger that catches it: hooks on every edit, gates on every pull request, schedules for slow drift, batch runs for the backlog. Layer the defences, and make sure every run ends in evidence a named person reads — automation makes review cheap, never optional. In Module 7 we connect Claude Code to the rest of your toolchain over MCP and make this whole approach work for a team, not just for you. See you there.