Slide 1 — Systems That Maintain Themselves
Welcome to the final module of Design Systems for Agents. Everything we have built so far — tokens that carry intent, a DESIGN.md the agent actually reads, review gates, audits, token sync — comes together here into one arrangement: a system that maintains itself, almost. The agent runs scheduled audits, checks every pull request, proposes fixes as reviewable changes, and keeps the documentation generated from source. The humans on the system shift from producing the maintenance work to approving it. The almost is important, and we will be honest about it: deprecations, breaking changes, and taste stay with people. Let's look at the loop.
Slide 2 — Maintenance as a standing loop, not a quarterly project
Here is the problem this module solves. Design systems rarely fail at launch — they fail later, quietly, as drift accumulates. A colour gets adjusted in the code but not the docs. A component grows a one-off variant during a deadline week. Each gap is small and defensible, and together they are why nobody fully trusts the system after eighteen months. The cause is structural: every design decision lives on more surfaces than anyone remembers, and humans are bad at finding all of them. That is search and bookkeeping, which is exactly what agents are good at. So the fix is not a heroic quarterly cleanup. It is a standing loop that runs on every change and on a schedule in between.
Slide 3 — The maintenance loop: detect, propose, review, apply, log
Here is the whole module in one picture. On the left, the source of truth: tokens, DESIGN.md, the audit schedule, and a written list of decisions that stay human. The agent detects — scheduled drift audits, design QA on every pull request, token and accessibility checks. It proposes — fixes, migrations, and regenerated docs, always as pull requests with a report of what was touched, never direct pushes. A human reads the diff and the gate output at the review gate. Approved changes get applied and logged. And everything that needs judgment — deprecations, breaking changes, taste — goes to people, whose decisions return to the source of truth and start the loop again. The agent runs the work. You hold the gate and the decisions.
Slide 4 — Scheduled audits and report cadence
Detection has two rhythms. The first is per change: every pull request that touches UI runs the standing gates — the token audit, the type check, accessibility and visual checks on the screens the branch changed. That keeps the baseline from eroding one merge at a time. The second is the calendar: a recurring audit, monthly or quarterly depending on how fast your product moves, that sweeps the whole surface for what the per-change rules cannot see — meaning drift, behaviour drift, components that pass the checks but no longer match their documented intent. Keep the audit brief in the repository, put the next run date in it, and make sure every finding arrives with evidence, a severity, and a named owner. A report routed to everyone is routed to no one.
Slide 5 — Documentation generated from source, not written after the fact
Now the surface nothing protects: documentation. No script fails when a sentence in DESIGN.md describes a colour that no longer exists, and authoritative-looking prose gets believed long after it stops being true. Two habits fix this. Generate what can be generated — prop tables, token values, usage examples — straight from the source, so they cannot drift. And for everything else, update the documentation in the same change that makes it stale, drafted by the agent from its own diff and reviewed together with the code. Remember what raises the stakes here: your agent reads DESIGN.md as instructions. Stale documentation does not just mislead people — it instructs every future run to be wrong. Treat docs the agent reads as configuration.
Slide 6 — Proposed fixes as pull requests, never direct pushes
Here is the rule that makes the loop safe to run: the agent proposes, humans merge. Every fix arrives as a pull request with its evidence attached — what was searched, what was changed, what was checked and left alone, plus the gate output from re-running the audits on the proposal itself. Small fixes are individual PRs: a hardcoded colour swapped for its token, a regenerated prop table. Big work, like a token migration across three hundred files, runs on a branch with a reviewer agent on every diff and a visual recapture at the end. Either way, the agent is allowed to do almost everything except decide that its own work is done. That decision is yours.
Slide 7 — The human rotation: what approval actually involves
The loop only works if the human side is staffed. Set up a small rotation — one named approver per week or sprint — and define what approval means. It means reading the diff, not the summary. Looking at the render on at least one affected surface. Checking the change stayed inside the scope it claimed. Reading the gate output rather than trusting the green tick. And escalating anything that touches a deprecation, a breaking change, or a taste call to whoever owns those decisions, instead of absorbing it. Budget it honestly: on an active system this is thirty to sixty minutes a day. Far less than doing the maintenance by hand — but not zero, and pretending it is zero is how the rotation collapses.
Slide 8 — Good delegation, bad delegation
Delegation fails in two directions. Too little, and the agent is just a faster find-and-replace while you still do the bookkeeping. Too much, and changes flow into the system without judgment — it stays consistent and gets quietly worse. The table is the line between them. Good: you pick the new value, the agent finds every surface, edits, runs the gates, and reports; a person reads the diff before merge; docs regenerate in the same change; the prompt names its scope. Bad: the agent merges because the checks passed, the reviewer approves without opening the diff, the instruction was fix everything. The test for any delegation is simple: is it checkable? Compliance is checkable. Whether the system should allow a second accent colour is not — and the agent will answer it confidently anyway.
Slide 9 — Failure modes: rubber-stamping, alert fatigue, silent scope growth
Self-maintaining systems fail quietly, so know the shapes in advance. Rubber-stamping: approvals on green checks — counter it by spot-auditing changes you recently approved. Alert fatigue: too many low-severity findings — tune the rules, batch the advisories, and never interrupt anyone for a P3. Silent scope growth: the agent starts fixing things nobody asked it to watch — restate the scope in every brief and check it was respected. Gate blind spots: the loop optimises for what its checks can see, so review the checks themselves quarterly. And the big one: months of smooth running erode attention exactly when the volume makes attention matter most. The dangerous failure is not a broken loop. It is a loop still running after the humans have stopped looking.
Slide 10 — What still needs people
So what still needs people? Mostly things that are decision rights, not capability gaps. Deprecations and breaking changes are about which teams you are willing to break and when — that has an owner, and it is not the agent. Whether a variation is drift or evolution is a question about intent. Taste is the thing the system exists to encode; do not let the loop flatten it. Accessibility needs its own checks and its own judgment — a passing token audit says nothing about contrast. What the agent does change is how well-informed those decisions are: it can hand you every consumer of a token and every surface a breaking change touches before you decide. Take the analysis. Keep the decision.
Slide 11 — Worked example: a month of self-maintenance, reviewed
Let's make the loop concrete with a month on one mid-size system, drawn from the executed case studies behind this course. The PR gate reviewed twenty-three UI branches and sent thirty-five findings back to their authors. The scheduled audit produced forty-seven findings; the owner confirmed the serious ones and demoted three as intentional — a call only a human can make. Nineteen small fix PRs went through the approval rotation; two got sent back for scope creep. One accent token change propagated everywhere in a three-line diff. And one deprecation question got a full impact analysis and a deliberate decision not to act this quarter, logged with the reasoning. Total human cost: about a day, almost all of it judgment. That is the trade.
Slide 12 — Exercise: draft your maintenance loop on one page
Your closing exercise: one page, your real system. List every surface a design decision lives on — tokens, theme files, components, docs, canvases, even the onboarding deck. Name your standing gates and your audit cadence, and write down the date of the next audit. Decide what the agent proposes in month one — start with mechanical fixes only, and add the rest once the rotation finds its rhythm. Staff the approval rotation with names and an honest time budget. And write the human-only decision list: deprecations, breaking changes, new tokens, intentional variation, taste. If any section is hard to fill in, that points you at the module to revisit before you switch the loop on.
Slide 13 — Summary, and where to go next
Let's close the course. Drift is the default state of every design system, and the counterweight is a standing loop: detect, propose, review, apply, log. Detection runs on every change and on a schedule. Everything the agent produces arrives as a reviewable change with evidence — proposals, never pushes. Documentation regenerates from source in the same change that makes it stale, because docs the agent reads are configuration. And the human work that remains is real work: a staffed approval rotation, and the decisions that stay yours — deprecations, breaking changes, and taste. From here, the natural next step is orchestration — running teams of agents across bigger programmes — or going deeper on review and critique. Thanks for taking the course, and good luck with the loop.