Slide 1 — Automation and Maintenance
Welcome back. So far in this course, every win has been a session: you sat down, briefed the agent, reviewed a plan, shipped a change. This module is about the work that should not need a session at all. The token check that runs on every pull request. The documentation that updates itself when a component changes. The audit that runs monthly whether anyone remembers or not. The rule of thumb is simple: the second time you find yourself doing the same chore with the agent, stop and automate it. The catch is keeping all of that reviewable rather than silently trusted — and that is most of what we will talk about.
Slide 2 — What is worth automating
Not everything deserves automation, so start with three questions. How often does this recur? If it happens on every pull request or every month, automation pays back; if it happens once, just run it as a supervised session. How boring is it? Rule-based, low-judgment work — regenerating docs, finding hardcoded values, exporting variants — is exactly what people defer, and deferred maintenance is where drift comes from. And what does an error cost? Automate where a wrong run is cheap to catch and reverse. There is a fourth question hiding underneath: can a check tell a good run from a bad one? If nothing can verify the output, do not automate it yet.
Slide 3 — Skills: a procedure the agent loads on demand
Here is the core mechanism of this module: the skill. A skill is a procedure written into a file — SKILL.md — that lives in your repository under .claude/skills. The frontmatter gives it a name, a description, and limits on which tools it can use. The body is the procedure itself: read these files, follow this template, write the output here, report what you did. From then on it is a slash command — /document-component Button — and anyone on the team can run it. Think of it as the prompt you refined over three sessions, written down so the fourth run is free and the tenth run is identical. That is the difference between a trick and a tool.
Slide 4 — Anatomy of a skill: documenting a component
Let's read one. This is a documentation skill: pass it a component name and it reads the source, the stories, and the tests, extracts the props, follows the team's template, and writes the doc to a known location. Notice how rigid it is. Exact paths, an exact template, a defined output file. The allowed-tools line means it can read and write but cannot run commands. And the last step is the honesty rule: where a prop has no description in the source, report the gap — do not invent one. Every line here removes a decision the agent would otherwise make slightly differently on every run. Consistency is the entire point of writing it down.
Slide 5 — The maintenance automation map
Here is the whole module in one map. On the left, the chores that recur in a design codebase: hardcoded values creeping in, design QA on pull requests, drift, stale docs, asset exports, migrations. In the middle, four mechanisms organised by trigger. Hooks run on every edit and block problems deterministically. CI checks run on every pull request and cannot be forgotten. Scheduled runs catch the slow drift nothing else sees. Background agents handle batch work — a migration across hundreds of files, docs for every component. The matching rule: pick the cheapest trigger that catches the problem. And on the right, the part that does not move — every run ends in a report a human reads, and deprecations, breaking changes, and taste stay human.
Slide 6 — Choosing the mechanism: trigger, scope, and review
Here are the five standing arrangements side by side, and the two columns that matter most are trigger and review point. A hook runs on every edit and its review is partly the block itself. A skill runs when someone invokes it, and its review is the report it produces. A CI gate runs on every pull request and its review is the findings comment read before merge. A scheduled run fires weekly or monthly, and its findings must land with a named owner — findings nobody owns might as well not exist. And a batch run gets reviewed twice: a reviewer agent on every diff, then a human on the pull request. If you cannot point at where a person reads the result, you have not chosen a mechanism. You have chosen to stop looking.
Slide 7 — Recurring audits: tokens, accessibility, content, visuals
Drift has two sources, so it needs two defences. The first is the per-change gate: on every pull request, run the token audit, the type check, automated accessibility checks, and a visual diff of the screens the branch actually changed. Cheap, constant, unambiguous — and limited to the rules someone wrote down. The second is the scheduled audit: monthly or quarterly, a whole-surface sweep asking whether the system as built still matches the system as intended. That is where you catch meaning drift, stale content, and platform gaps no single change caused. Give findings evidence, severity, and a named owner. And know your blind spots — a passing token audit says nothing about contrast. Green checks are not a design review.
Slide 8 — Asset production runs: exports, variants, documentation
The next category is asset production: work where the procedure is identical for every item and only the count made it painful. Documentation for every component, regenerated from source. Variants added to a component following the exact structure of the ones that exist — type, styles, story, test. Export sets at every required size. The pattern is fan-out: one agent per item, the same instructions for all of them, results collected into a single report. The quality control is the pattern reference — point every run at the existing convention so fifty outputs look like one set, not fifty improvisations. And spot-check the first few before the batch finishes, because fan-out multiplies mistakes exactly as efficiently as it multiplies good work.
Slide 9 — Maintenance chores: migrations, dead code, stale docs
Now the backlog: the chores everyone agrees matter and nobody schedules. The token migration that touches three hundred files. The components exported but never used. The documentation describing values that no longer exist. The dependency bump nobody wants to own. These stall because they are big, boring, and risky — and the pattern handles each part. Humans freeze the decisions first, like a token mapping file with the ambiguous cases marked ask. A script loops one agent per file, with a reviewer agent refusing any diff that goes beyond the mapping. A visual recapture at the end proves the screens did not change where they were not supposed to. Always on a branch, always resumable, and the run never merges itself.
Slide 10 — Keeping automation reviewable
Here is the part that decides whether any of this survives contact with reality. The line between good and bad delegation is not how much the agent does — it is whether every run produces evidence a person actually reads. So: every run ends in a report covering what changed, what was checked, and what was left alone. Gate output gets attached, because a claim of passing is not the output. Findings route to a named owner, not a channel. The run never merges its own work. And approving on green checks without reading the diff is just the human version of a gate blind spot. One boundary above all: deprecations, breaking changes, and taste are decisions, not chores. They stay out of the automation entirely.
Slide 11 — Worked example: one chore, turned into a skill, then scheduled
Let's trace one chore end to end: component documentation that always drifts. Week one, run it manually — a briefed session documents two components, and the prompt gets refined twice, including adding the rule to report missing prop descriptions instead of inventing them. Week two, freeze that procedure into a skill, and the cost per component drops to minutes. Week three, add the trigger: a Monday run finds the components that changed that week, regenerates their docs, and opens a pull request. Steady state is about fifteen minutes of review a week, and docs that no longer drift between releases. Notice that nothing got smarter along the way. The procedure got written down, then given a trigger — and a human still reads every pull request.
Slide 12 — Exercise: draft a skill for your most repetitive design task
Your turn. Pick the chore you have done at least twice with the agent — documentation, a token check, an export set — and draft its skill on paper. Run it past the four tests first: frequency, boredom, error cost, and whether anything can verify the output. Then write the frontmatter — name, description, argument, tools — and the procedure as numbered steps with exact paths and the template it must follow. End with the report: what it changed, what it checked, what it flagged rather than guessed. Finally, choose its trigger and name the person who reviews its output. If you cannot write steps a careful new colleague could follow, the chore is not ready to be a skill — and knowing that is the point of the exercise.
Slide 13 — Summary, and what comes next
Let's close. Automate the chores that recur, bore, and can be verified — and keep the one-offs and the judgment calls as sessions. Skills turn your refined prompt into a versioned procedure anyone on the team can run. Match each chore to the cheapest trigger that catches it: hooks on every edit, gates on every pull request, schedules for slow drift, batch runs for the backlog. Layer the defences, and make sure every run ends in evidence a named person reads — automation makes review cheap, never optional. In Module 7 we connect Claude Code to the rest of your toolchain over MCP and make this whole approach work for a team, not just for you. See you there.