Section 1
Why design ops reporting is a workflow, not a dashboard
Most design ops leads already produce some version of this report. They count component usage by hand or with a half-maintained script, screenshot the accessibility backlog, export pull requests into a spreadsheet, and spend a day each month turning it into something a leadership group will read. The work is real and the evidence is useful; the problem is the cost of assembling it and the difficulty of trusting numbers nobody can trace back to a source.
A dashboard does not solve this, because the most useful design ops evidence does not live in one system. Token usage lives in product repositories, QA debt lives in previous workflow runs and issue trackers, review throughput lives in pull request history, and contribution requests live in whichever channel the team uses to ask for new components. The report is a synthesis job, and synthesis jobs are exactly what an agent workflow on a cadence does well.
This workflow runs as a saved command — /design-ops-report — on whatever cadence the organization reports on. Per-source agents compute their slice with scripts, a synthesis agent writes the narrative, and every number in the output links to the script, query, or file that produced it. The output is an artifact a design ops lead reviews and signs, not something the agent publishes on its own.
Section 2
When to reach for this workflow
Reach for it when the same evidence gets assembled by hand more than twice. Quarterly design system adoption updates, monthly QA debt summaries for a leadership review, retainer reports for an agency client, and pre-planning evidence for a roadmap conversation are all the same shape: gather, count, compare with last time, explain.
It is an Intermediate workflow because the orchestration is simple but the inputs are plural: you need at least one counting script, access to repository history, and the output of earlier workflow runs if you want trends rather than snapshots.
- A design system team that needs adoption numbers leadership will believe.
- A design ops lead spending a day per cycle on manual screenshot-and-spreadsheet assembly.
- An agency reporting design QA work to a client on a retainer.
- A platform team deciding which contribution requests to fund next quarter.
Section 3
What the report measures, and what it refuses to measure
The report covers four evidence sources. Design system adoption: token and component usage counted by scripts across the product repositories, expressed as coverage of styled surfaces rather than raw counts. QA debt: open findings from previous accessibility sweeps and visual QA runs, trended against earlier cycles. Review throughput: how long design-flagged pull requests and review issues sit open, from PR and issue exports. Contribution backlog: outstanding requests for new components, patterns, or tokens, with their age.
One boundary should be written into the saved command itself: this report informs prioritization conversations about systems, debt, and process. It is not a designer-productivity surveillance tool. It does not rank individuals, it does not count how many screens a designer shipped, and it does not report who was slow. The moment per-person numbers appear in the output, the team will start optimizing for the report instead of the work, and the people whose cooperation the design system depends on will stop trusting it. Say so plainly in the report header, and enforce it in the synthesis agent's instructions.
Token and component coverage per repo, counted by script, compared with the previous cycle
Open and resolved findings from prior accessibility and visual QA workflow runs, trended
Age and cycle time of design-review PRs and issues from gh exports, reported as distributions not per-person
Open contribution requests with age, requesting team, and current status
The narrative: what changed, what it means for prioritization, and what needs a decision
Each source gets its own agent, its own script, and its own appendix entry explaining the computation.
Section 4
The orchestration pattern: a saved command on a cadence
The report runs as a Claude Code dynamic workflow. When the prompt includes the word workflow, or you run with /effort ultracode, Claude writes a JavaScript orchestration script that runs in the background. The script holds the repository list, the script outputs, and the previous report's numbers in its own variables; it dispatches one subagent per evidence source and a final synthesis agent, and only the merged result comes back into the conversation. Intermediate output — thousands of grep hits, raw PR JSON — stays in the script, which is what keeps a multi-repo count from flooding Claude's context.
Up to sixteen subagents run concurrently and a single run can dispatch up to a thousand, which is far more headroom than four source agents need; the headroom matters when the adoption agent itself fans out one counter per repository. Runs are resumable, so a failed GitHub export does not restart the counting. Once the run produces a report the team trusts, save it to .claude/workflows/ in the design system repo (or ~/.claude/workflows/ for a personal version) as /design-ops-report, and the next cycle is one command plus a review.
Subagent definitions live in .claude/agents/ as markdown files with frontmatter, so the adoption counter and the synthesis writer are versioned alongside the scripts they call. That versioning is what makes quarter-over-quarter numbers comparable: the same definition computed both quarters.
Design decision
Run counting scripts per repo
Design decision
Adoption agent computes coverage
Design decision
QA debt and throughput agents trend their sources
Design decision
Contribution agent ages the backlog
Design decision
Synthesis agent writes the narrative
Design decision
Design ops lead reviews and signs
Per-source agents compute their slice with scripts; a synthesis agent writes the narrative; a human signs the report.
Section 5
What the orchestration script looks like
You do not write the orchestration script by hand — Claude writes it when the run starts — but it helps to know roughly what it does, because that is what you are reviewing when you decide whether to save it as a command. The sketch below uses a simplified agent() pseudo-API to show the shape: source agents run concurrently, their outputs stay in script variables, and only the synthesis result returns to the conversation.
// Sketch of the orchestration Claude generates for /design-ops-report.
const sources = ["adoption", "qa-debt", "throughput", "contributions"]
const slices = await Promise.all(
sources.map((source) =>
agent("Compute the " + source + " slice from its named inputs in design-ops/output/. " +
"Compare with design-ops/reports/previous.md. Return JSON figures with a computation note each.",
{ model: "sonnet" })
)
)
// Raw slice JSON stays here, in script variables — not in Claude's context.
const draft = await agent(
"Write the cycle report from these slices using design-ops/report-template.md. " +
"Every figure gets an appendix footnote. No per-person metrics. Mark as DRAFT.\n\n" +
JSON.stringify(slices),
{ model: "opus" }
)
await writeFile("design-ops/reports/2026-Q2.md", draft)Section 6
Step 1: make the counts scriptable
The adoption number is only as good as the counting script, so write the script first and let agents call it rather than estimate. A token coverage script walks each product repository, counts style declarations that resolve to design tokens versus hard-coded values, and writes a JSON summary per repo. Component coverage does the same for imports from the design system package versus local one-off components.
Keep the script deliberately simple and document its blind spots in the report appendix: a regex counter does not understand computed styles, and coverage of files is not coverage of user-facing surface area. A traceable, slightly crude number beats an untraceable precise-looking one.
import { readFile, writeFile } from "node:fs/promises"
import { globby } from "globby"
const repos = JSON.parse(await readFile("design-ops/repos.json", "utf8"))
const summary = []
for (const repo of repos) {
const files = await globby(["**/*.{css,scss,tsx,ts}", "!**/node_modules/**"], { cwd: repo.path })
let tokenRefs = 0
let hardCoded = 0
for (const file of files) {
const text = await readFile(repo.path + "/" + file, "utf8")
tokenRefs += (text.match(/var\(--(color|space|font|radius)-[a-z0-9-]+\)/g) || []).length
hardCoded += (text.match(/#[0-9a-fA-F]{3,8}\b|\b\d+px\b/g) || []).length
}
summary.push({
repo: repo.name,
files: files.length,
tokenRefs,
hardCoded,
coverage: tokenRefs / Math.max(1, tokenRefs + hardCoded),
})
}
await writeFile("design-ops/output/token-usage.json", JSON.stringify(summary, null, 2))Section 7
Step 2: export the slower sources once per cycle
Review throughput and contribution backlog come from GitHub. Export them with the GitHub CLI at the start of the run so the agents read stable JSON files instead of paging an API mid-run. The same applies to QA debt: the accessibility sweep and visual QA workflows on this site already write findings files; point this report at those folders and the trend comes for free.
# Design-review pull requests, last 90 days gh pr list --repo org/product-web --label "design-review" --state all --limit 500 \ --json number,title,createdAt,closedAt,labels > design-ops/output/design-prs.json # Contribution requests to the design system gh issue list --repo org/design-system --label "contribution-request" --state all --limit 200 \ --json number,title,createdAt,closedAt,state,labels > design-ops/output/contribution-requests.json # QA debt: previous workflow outputs already on disk ls agent-workflows/accessibility-sweep/*/findings.md agent-workflows/visual-qa/*/findings.md
Design decision
Confirm the repo list
Design decision
Run counting scripts
Design decision
Export PRs and issues
Design decision
Gather prior QA findings
Design decision
Pull last cycle's report
Design decision
Stage files for the run
Counts and exports are produced once per cycle so every agent reads stable files instead of paging live systems.
Section 8
Step 3: the workflow prompt
With scripts and exports in place, the prompt asks for the fan-out and names every input explicitly. Naming the files is what keeps the agents from inventing numbers; requiring a computation link per figure is what makes the report auditable.
Run this as a workflow. Produce the design ops report for this cycle and write it to design-ops/reports/2026-Q2.md. Inputs: - design-ops/output/token-usage.json and component-usage.json (from scripts/count-token-usage.mjs and count-component-usage.mjs) - design-ops/output/design-prs.json and contribution-requests.json (gh exports) - agent-workflows/accessibility-sweep/*/findings.md and agent-workflows/visual-qa/*/findings.md - design-ops/reports/2026-Q1.md (previous cycle, for trends) Dispatch one agent per source: adoption, QA debt, review throughput, contribution backlog. Each agent computes its numbers only from its named inputs, compares with the previous cycle, and returns figures plus a one-line note on how each figure was computed. Then dispatch a synthesis agent to write the report using design-ops/report-template.md. Every number must link to its computation in the appendix. Report throughput as distributions across the team, never per person. Do not rank or name individual designers. Flag anything that needs a prioritization decision. Do not modify any source files. The report is a draft until the design ops lead signs it.
Section 9
Step 4: the synthesis subagent
The synthesis agent is the one most worth defining carefully, because it is the one that turns counts into claims. Its definition lives in .claude/agents/ and travels with the repo, so the same editorial rules apply every cycle.
--- name: design-ops-synthesizer description: Writes the design ops report narrative from per-source agent outputs. Read-only over sources; writes only the report draft. tools: Read, Write --- You write the cycle report from the structured outputs of the adoption, QA debt, throughput, and contribution agents. Rules: - Every number in the report must carry a footnote reference to the script, export, or findings file it came from. - Compare with the previous cycle wherever the previous report has a matching figure; say "no prior figure" otherwise. - Report throughput and workload as team-level distributions. Never name, rank, or compare individual designers. If an input contains per-person data, aggregate it before reporting. - Separate observations (what the numbers show) from recommendations (what to prioritize), and label recommendations as proposals for the prioritization conversation. - Keep the narrative under 900 words; the appendix can be as long as it needs to be. - Mark the document as DRAFT until a human removes the marker.
Section 10
Case study: a five-squad org tracking adoption quarter over quarter
A product organization with five squads and one platform-design pair wanted a defensible adoption number after a year of design system investment. The first run took the longest because the counting scripts had to be written and the repo list agreed; the run itself finished in about 40 minutes across six repositories.
The first quarter's report put token coverage at 61 percent, with two repositories below 45 percent and one legacy admin app dragging the average. Because every number linked to the script and the per-repo JSON, the squad leads argued about priorities instead of arguing about the measurement. Two quarters later the same saved command reported 78 percent coverage, and the report's appendix showed the gain came almost entirely from the two low repos — which is what the prioritization conversation had funded.
The number nobody expected: the contribution agent showed nine open component requests older than 90 days, all from the same two squads with the lowest coverage. The synthesis agent flagged the correlation as a question, not a conclusion, and the team's follow-up found those squads had stopped requesting because earlier requests had stalled.
Section 11
Case study: replacing a day of screenshots and spreadsheets
A design ops lead at a 60-person product company spent roughly a working day each month assembling a leadership update: screenshots of the accessibility backlog, a hand-built spreadsheet of design-review PRs, and a paragraph of narrative written at 6 pm. The numbers were broadly right and entirely untraceable, and a pointed question in one leadership review — where does the 40 percent figure come from? — was the trigger to change the process.
After two cycles of tuning, the saved /design-ops-report command produced the draft in about 35 minutes of run time. The lead's monthly effort dropped to roughly two hours: an hour checking the draft against spot samples and an hour rewriting the recommendations in their own voice. The traceability turned out to matter more than the time saved; the same pointed question the next quarter was answered by clicking the footnote.
The lead also kept one manual element on purpose: a short qualitative section on what designers were saying about the system, gathered in conversations. The workflow assembles evidence; it does not replace listening.
Section 12
Case study: an agency reporting QA debt on a retainer
A design agency maintaining a client's marketing site and product surface under a retainer used the report to make their QA work visible. Each month, the QA debt agent trended findings from the agency's accessibility sweeps and visual QA runs: opened, resolved, and outstanding, by severity.
The first client-facing report showed 34 findings resolved and 12 outstanding, with the outstanding list dominated by issues that needed client-side content decisions rather than agency fixes. That distinction — visible because every finding linked back to its source run — reframed a tense conversation about velocity into a shared backlog discussion, and the retainer was renewed with a clearer split of responsibilities.
The agency stripped the throughput section entirely for the client version. The saved command takes a flag for that: internal cycles include process metrics, client cycles include only the work and the debt.
Section 13
Good vs bad report content
The failure mode of automated reporting is confident numbers with no provenance and recommendations with no owner. Spot-check the draft against the raw outputs before signing it, and tighten the synthesis instructions when the narrative drifts toward claims the evidence cannot carry.
Design system adoption is improving across the org
Token coverage rose from 61% to 78% across six repos (count-token-usage.mjs, appendix A1); the gain is concentrated in checkout-web and admin-app
Designers are slow to close review requests
Median time-to-close for design-review PRs was 4.2 days this cycle vs 6.8 last cycle (design-prs.json, appendix A3); reported as a team distribution
We should invest more in the design system
Proposal for the prioritization conversation: fund the two component requests older than 90 days from the lowest-coverage squads (appendix A4)
A trustworthy figure names its computation; a trustworthy recommendation names the decision it feeds.
Section 14
What this report cannot prove
Coverage numbers measure code, not experience. A repository can be 90 percent on tokens and still ship a confusing flow, and a squad with low adoption may be doing the most valuable design work in the company. The report informs where to spend systems effort; it does not grade teams, and it cannot decide priorities by itself.
Throughput numbers are especially easy to misread. Cycle time reflects review capacity, dependency tangles, and how work is sliced at least as much as it reflects effort. That is why the workflow reports distributions, refuses per-person figures, and labels every recommendation as input to a conversation a human leads.
- It cannot measure design quality or user outcomes; pair it with research and product metrics.
- It cannot attribute causes; it shows correlations and asks questions.
- It cannot be used to evaluate individual designers; the workflow is built to refuse that, and the team should be too.
- It cannot set priorities; it equips the people who do.
Section 15
The reusable reporting workflow
Save the orchestration as /design-ops-report once the first cycle's draft survives a leadership review, and keep the scripts, exports, and subagent definitions in the same repo so the numbers stay comparable across cycles. The cadence does the rest: the report stops being a heroic monthly effort and becomes a routine artifact the prioritization conversation starts from.
1. Agree the repo list, the evidence sources, and the cadence; write them into design-ops/repos.json. 2. Run the counting scripts: token usage and component usage per repo. 3. Export the cycle's PR, issue, and contribution data with the GitHub CLI. 4. Fan out one agent per source: adoption, QA debt, throughput, contribution backlog. 5. Run the synthesis agent against the report template; every figure links to its computation. 6. Review the draft: spot-check numbers, rewrite recommendations in your own voice, add the qualitative section. 7. Sign and circulate; file the report next to the previous cycles for the trend. 8. Save the run to .claude/workflows/ as /design-ops-report and rerun next cycle.
Sources

