Section 1
Why documentation should be generated from the code
Design system documentation goes stale the moment the system keeps moving and the docs do not. A component gains a prop, loses a variant, or picks up a new accessibility behavior, and the Storybook page describing it quietly becomes a description of the component as it was eighteen months ago. Nobody notices until a product designer builds against the docs and gets a different component than the one they read about.
The honest source of truth is not the docs site. It is the component source, the token files, the type definitions, and — most underused of all — the hundreds of real usages scattered across the product. The code says what the props actually are. The usages say how teams actually combine them, which defaults they override, and where they fight the component instead of using it as intended.
This workflow treats documentation as a build artifact derived from those sources, refreshed on demand, and reviewed by humans before it ships. The agents do the careful reading at scale; the design system team keeps the judgment about what the guidance should say.
Section 2
When to reach for this workflow
Reach for it when the gap between the system and its docs has become a known tax: new joiners ask questions the docs should answer, support channels repeat the same explanations, or an audit shows half the documented props no longer exist.
It is an Intermediate workflow because the generation is straightforward but the review is real work. Generated docs that nobody reads before publishing are just stale docs with a newer date on them.
- Storybook or docs-site pages that have not been touched since the components last changed.
- A system handover — to a client, a new team, or an open source release — where the docs are the deliverable.
- Recurring questions in the design system support channel that the docs should already answer.
- A token or API migration where the docs need to describe the new world, not the old one.
Section 3
What good component documentation actually contains
A useful component page answers four questions: what the component is for, how to configure it, how teams actually use it, and what it requires to stay accessible. Most stale docs answer the first two from memory and skip the last two entirely, because the usage and accessibility content is exactly the part that takes hours of reading to get right.
That reading is what the per-component agents do. Each one reads the component source, its types, its tests, its stories if they exist, and every usage of the component it can find in the product code. The do and don't guidance comes out of that evidence — the patterns teams repeat, the props they consistently misuse, the workarounds that signal a missing variant.
Component source, design intent notes, Figma component description via Figma MCP
Type definitions and prop defaults, read directly from the source
Token references in the component styles; which semantic tokens it consumes
Real usages mined from the product codebase, grouped into patterns and anti-patterns
ARIA usage, keyboard handling, and focus behavior in the source, plus gaps worth flagging
Differences between the documented or designed component and what the code does today
Each section of the doc page maps to a source the agent can actually read.
Section 4
The orchestration pattern: one agent per component
The workflow runs as a Claude Code dynamic workflow. Include the word workflow in the prompt, or run with /effort ultracode, and Claude writes a JavaScript orchestration script that runs in the background. The script builds the component inventory, dispatches one documentation agent per component, and collects the drafts in its own variables — intermediate results never enter Claude's context, which is what lets a 50-component system fit in a single run.
Up to sixteen agents run concurrently and a run can dispatch up to a thousand, so the per-component fan-out finishes in hours, not days. The run is resumable if it is interrupted partway through the inventory, and once the orchestration works you save it to .claude/workflows/ in the project — or ~/.claude/workflows/ for a personal copy — as a reusable command for the next refresh.
After the fan-out, a single consistency agent reads every draft and enforces the shared template: same section order, same voice, same way of writing prop tables and do/don't pairs. Without that pass you get fifty individually reasonable pages that read like fifty different authors, which is its own kind of drift.
Design decision
Build component inventory
Design decision
Fan out per-component agents
Design decision
Mine real usages per component
Design decision
Draft doc pages to template
Design decision
Consistency agent pass
Design decision
Human review and edit
Design decision
Publish to docs site
Inventory the system, fan out one agent per component, mine usages, pass a consistency check, and gate on human review before publishing.
Section 5
Step 1: build the inventory and the template
Start by listing what counts as a component. In most systems that is the exported components in the package's source directory, minus internal helpers. The inventory needs the component name, its source path, and the directories where product usages live, because the usage mining is scoped per component.
Write the doc template before generating anything. The template is a contract with the agents: every page has the same sections in the same order, and a section the agent cannot fill from evidence is marked as a gap rather than padded with plausible text. Templates written after generation tend to describe what the agents happened to produce instead of what the team needs.
# {ComponentName}
One-sentence purpose. What problem this component solves and when not to use it.
## Anatomy
Named parts of the component and which props control them.
## Props
Table: prop, type, default, description. Generated from the type definitions, not from memory.
## Variants and states
Each variant with the design intent it serves. Interaction states: hover, focus, active, disabled, loading, error.
## Usage guidance
### Do
Patterns observed in real usages that match the component's intent. Cite the count.
### Don't
Anti-patterns observed in real usages, with what to use instead.
## Tokens
Semantic tokens this component consumes. Note any hard-coded values that bypass tokens.
## Accessibility
Keyboard behavior, ARIA usage, focus management, and required props (labels, descriptions). Gaps flagged explicitly.
## Known drift
Differences between this doc's source evidence and the previous documentation or the design file.
## Sources
Files read to produce this page, with the date generated.Section 6
Step 2: the workflow prompt
The prompt names the inventory, the template, the usage directories, and the output location, and it forbids publishing without review. The per-component agents are read-only against the product code; they write only their own draft page.
Run this as a workflow. Generate documentation for every component in packages/ui/src/components against docs/component-doc-template.md. For each component, dispatch one agent that: - Reads the component source, types, styles, tests, and Storybook stories if present - Finds real usages under apps/ and packages/ (excluding packages/ui itself) and groups them into patterns - Drafts a doc page following the template exactly, citing the files it read - Marks any template section it cannot support with evidence as 'Gap: needs design system team input' - Records known drift: props or behavior that differ from the existing docs in docs/components/ Then run one consistency agent that checks every draft for template compliance, consistent voice, and consistent prop-table format, and writes a summary of inconsistencies it fixed. Output drafts to docs/generated/<component>.mdx. Do not overwrite docs/components/ and do not publish; the design system team reviews the drafts first.
Section 7
Step 3: a sketch of the orchestration script
You do not write this script by hand — Claude writes it when the workflow runs — but seeing the shape helps you reason about what stays out of context. The component list, every draft, and the consistency report live in script variables; only the final summary comes back to the conversation.
// Sketch of the orchestration Claude generates for this workflow.
// agent(prompt, options) dispatches a subagent and resolves with its output.
const components = await agent("List exported components in packages/ui/src/components with their source paths. Return JSON.", { model: "haiku" })
const drafts = []
for (const batch of chunk(JSON.parse(components), 16)) {
const results = await Promise.all(
batch.map((c) =>
agent(
"Document the " + c.name + " component at " + c.path + " using docs/component-doc-template.md. " +
"Mine usages under apps/ and packages/, cite files, mark gaps. Write the draft to docs/generated/" + c.slug + ".mdx and return a one-line status.",
{ model: "sonnet" }
)
)
)
drafts.push(...results)
}
const consistency = await agent(
"Read every file in docs/generated/. Enforce the template: section order, voice, prop-table format. Fix inconsistencies in place and return a summary of changes.",
{ model: "sonnet" }
)
return { documented: drafts.length, consistency }Section 8
Step 4: a reusable component documenter subagent
Defining the documenter once in .claude/agents/ keeps every component held to the same evidence standard, in this run and in the next refresh. The key constraint is that it documents what it can cite and labels what it cannot.
--- name: component-documenter description: Documents one design system component from its source, types, tokens, tests, and real product usages, following the shared doc template. Read-only against product code; writes only its own draft page. tools: Read, Grep, Glob, Write --- You document one component. Follow docs/component-doc-template.md exactly: same sections, same order. Evidence rules: - Props and defaults come from the type definitions and source, never from existing docs or memory. - Usage guidance comes from real usages you found; state how many usages support each do or don't. - Accessibility notes describe what the code does today, plus gaps flagged as gaps. - Anything you cannot support with a file you read is written as 'Gap: needs design system team input'. List every file you read in the Sources section with today's date. Do not modify any file outside docs/generated/. Do not invent variants, props, or guidance.
Section 9
Step 5: human review before anything publishes
The drafts land in a generated directory, not on the docs site. The design system team reviews them the way they would review a contribution from a new team member: check the prop tables against a couple of components they know well, read the do/don't guidance for anything that codifies a workaround as a recommendation, and decide what to do with each known-drift entry.
Drift entries are the most valuable output and the easiest to mishandle. Some drift means the docs were wrong; some means the code has wandered from the design intent and the code should change, not the docs. That call belongs to the team, and the workflow's job is only to surface it with evidence attached.
- Spot-check prop tables for two or three components the reviewer knows by heart.
- Read every don't entry: is it an anti-pattern, or a missing feature the system should add?
- Triage each drift entry: fix the docs, fix the code, or record an intentional exception.
- Only then move drafts into the published docs directory and open the pull request.
Design decision
Drafts land in generated
Design decision
Spot-check prop tables
Design decision
Challenge don't entries
Design decision
Triage drift entries
Design decision
Fix docs or code
Design decision
Publish reviewed pages
Drafts stay in the generated directory until the design system team has spot-checked the evidence and triaged every drift entry.
Section 10
Case study: a 50-component React system with two-year-old Storybook docs
A B2B product team had a 50-component React system whose Storybook docs were last meaningfully updated two years earlier. Twelve documented props no longer existed, nine new components had no pages at all, and the accessibility sections were copy-pasted boilerplate.
The workflow ran in just under three hours: an inventory pass, fifty per-component agents in batches of sixteen, and a consistency pass. It produced 50 draft pages, flagged 31 known-drift entries, and marked 14 sections as gaps needing team input — mostly design rationale that no code could supply.
Review took the two design system maintainers a day and a half. They rejected four pieces of generated guidance that had codified workarounds as recommendations, turned six drift entries into code fixes rather than doc fixes, and published the rest. The next refresh, run a quarter later from the saved workflow, took 40 minutes of review because only eleven components had changed.
Section 11
Case study: mining 300 Button usages for usage rules
A platform team suspected their Button component was being misused but had no evidence beyond anecdotes. The usage-mining step of this workflow, scoped to just that component, found 312 usages across the monorepo.
The agent grouped them: 71 percent used the primary or secondary variants as intended, 14 percent overrode the button's padding or font size with local styles, 9 percent used a Button styled as a link where a real anchor belonged, and 6 percent nested icons in ways the component never designed for. Eighteen usages passed raw hex colors into a style prop, bypassing tokens entirely.
The resulting do/don't section was written from those numbers rather than from taste, which made it persuasive: the don't entries each linked to the count and three example files. The team also got a backlog item they had not asked for — a proper LinkButton variant — because the 9 percent misuse pattern was clearly a missing component, not careless engineers.
Section 12
Case study: documenting a delivered system before client handover
An agency had built a 34-component design system for a client over eight months and needed handover documentation as a contractual deliverable. The system was solid; the documentation budget had been spent on building it.
The workflow generated the full doc set in one afternoon, including token tables and accessibility notes per component. Because the agency would not be around to answer questions later, the review pass focused on the gaps: every section the agents marked as needing input was either filled by the original designers or explicitly marked as a decision the client's incoming team would own.
The handover package shipped with 34 doc pages, a token reference, and an honest list of nine open decisions. The client's team later reported that the open-decisions list was the most useful page in the package, which is a fair summary of what generated documentation is for: it does the reading so humans can spend their time on the judgment.
Section 13
Good vs bad generated docs
The failure mode of generated documentation is plausible filler: pages that read smoothly and say nothing the code did not already say more precisely, or worse, guidance invented to fill a template section. Spot-check for it before review and tighten the documenter's evidence rules if it appears.
Use the Button for actions. Avoid overusing it.
Don't override Button padding with local styles — found in 44 of 312 usages, usually to fit dense tables; use the size="compact" variant added in v3.2 instead
This component is fully accessible.
Dialog traps focus and restores it on close (useFocusTrap in Dialog.tsx); it does not set aria-describedby from the description prop — flagged as a gap for the next minor release
Props: see the source code for details.
Props table generated from ButtonProps in Button.types.ts on 2026-06-02, including the deprecated `kind` prop and its replacement
A useful doc page cites its evidence and admits its gaps.
Section 14
What this workflow cannot prove
Generated docs describe what the system is, not what it should be. The agents can report that a component supports six variants; they cannot decide whether six is too many. They can mine how teams use a component; they cannot know which of those uses the design system team wants to endorse.
Design rationale, naming decisions, deprecation policy, and the voice of the system remain human work. The workflow earns its keep by making sure that work starts from accurate, current evidence instead of from archaeology.
- It cannot supply design rationale or intent the code does not contain.
- It cannot decide whether observed usage patterns should become endorsed guidance.
- It cannot judge information architecture of the docs site itself — what to group, split, or retire.
- It cannot replace the editorial review; unreviewed generated docs are stale docs with a newer date.
Section 15
The reusable documentation workflow
Save the working orchestration to .claude/workflows/ and rerun it after each significant release, or quarterly. Because the template, the documenter subagent, and the output format stay stable, each refresh is a diff against the last one — and the review gets cheaper every time the system holds still.
1. Build the component inventory: name, source path, and the directories where usages live. 2. Write or confirm the doc template; it is the contract every agent follows. 3. Fan out one documenter agent per component: source, types, tokens, tests, and real usages. 4. Require evidence per section; unsupported sections are marked as gaps, not filled. 5. Run the consistency agent across all drafts: template compliance, voice, table format. 6. Human review: spot-check props, challenge the don't entries, triage every drift item. 7. Publish reviewed pages to the docs site; turn code-side drift into tickets. 8. Save the workflow and rerun it each release or quarter; review only what changed.
Sources

