Section 1
Why token migrations stall
Almost every team that adopts semantic tokens already has a codebase full of the old way: raw hex codes, one-off pixel values, Tailwind arbitrary values, and a legacy token set that half the files import. The new tokens are designed, documented, and approved, and then the migration sits in the backlog because it touches three hundred files and nobody can spend a month on find-and-replace.
Plain find-and-replace does not work because the same value means different things in different places. The same gray is a border in one file, disabled text in another, and a chart gridline in a third. The mapping needs judgment per usage, which is exactly why it never gets done by script alone.
This workflow gets the migration done in a controllable half day: a script holds the file list and the approved mapping, agents do the per-file judgment, a reviewer agent checks every diff, and a visual recapture proves the UI did not change where it was not supposed to.
Section 2
When to reach for this workflow
Reach for it when the new token set is finished and the remaining work is applying it: hex codes to semantic tokens, legacy token names to new names, or hard-coded values to scale steps. The mapping decisions should mostly be made before the run starts.
It is an Advanced workflow because agents write to many files and the failure mode is subtle visual regression rather than a build error. The guardrails, especially the reviewer gate and the recapture, are not optional.
- A finished semantic token set waiting to be adopted across the codebase.
- Two token sets that need to become one after a merger or a platform consolidation.
- A dark mode project blocked because colors are hard-coded instead of semantic.
- A Style Dictionary or Tailwind theme change that renames tokens many files import.
Section 3
The orchestration pattern: a script that owns the file list
This is a Claude Code dynamic workflow in its most literal form. Claude writes a JavaScript orchestration script that runs in the background and owns the plan: the file list, the token map, the queue of pending files, and the results of every per-file agent. None of that intermediate state passes through Claude's own context, which is what makes a 300-file migration possible in one run.
The script loops the file list and dispatches one migration agent per file, up to sixteen concurrently, with a hard ceiling of a thousand agents per run. Each agent gets one file and the mapping, nothing more. Because the script tracks which files are done, the run is resumable: if it stops at file 180, it restarts at file 181 instead of file 1.
You trigger the dynamic workflow by including the word workflow in the prompt or by running with /effort ultracode. When the run works end to end, save it to .claude/workflows/ so the next migration, or the next batch of this one, is a single command.
Build file list
Load token map
Dispatch per-file agent
Reviewer checks diff
Accept or send back
Mark file done
Recapture visuals
feeds next cycleThe script loops the file list; every file passes through a migration agent and a reviewer agent before it counts as done.
Section 4
Step 1: freeze the token map first
The single most important artifact in this workflow is the mapping file. It records every legacy value and token name, the new semantic token it becomes, and the cases where context matters. Agents apply the map; they should not be inventing it mid-run.
Where one old value maps to several new tokens depending on usage, say so explicitly and describe how to tell the difference. Where the right answer is genuinely unclear, mark the value as ask, and the agent will flag it instead of guessing.
{
"#1A73E8": { "to": "color.action.primary", "note": "Buttons and links. If used as a chart series color, use color.dataviz.1 instead." },
"#5F6368": { "to": "ask", "note": "Used for both secondary text and borders. Map text usages to color.text.secondary, borders to color.border.default." },
"gray-500": { "to": "color.text.secondary" },
"brand-blue-600": { "to": "color.action.primary" },
"rgba(0,0,0,0.12)": { "to": "color.border.subtle" },
"px-[18px]": { "to": "px-4", "note": "Off-scale spacing. 18px maps down to space.4 (16px) per the migration decision log." },
"shadow-card-legacy": { "to": "shadow.raised" },
"fontSize: 13": { "to": "text.body.sm", "note": "13px collapses into the 14px small body step." }
}Section 5
Step 2: the migration workflow prompt
The prompt names the scope, the map, the rules, and the guardrails. The most important rules are the negative ones: do not restructure components, do not fix unrelated issues in passing, and never guess when the map says ask.
Run this as a workflow.
Migrate hard-coded values and legacy tokens to the new semantic token set.
Map: migration/tokens-map.json (the only source of truth for replacements)
Scope: src/**/*.{tsx,ts,css} excluding src/legacy and generated files
For each file, dispatch one agent that:
- Replaces values and legacy tokens exactly as the map specifies
- Flags any value found in the file but missing from the map, without changing it
- Flags every usage of a value the map marks as ask, with its judgment and reasoning
- Makes no other changes: no formatting, no renames, no refactors
After each file, dispatch a reviewer agent on the diff. The reviewer rejects diffs that change anything beyond mapped values or that resolve an ask without flagging it. Rejected files go back through migration once with the reviewer notes.
Track progress in migration/progress.json so the run is resumable. Write all flags to migration/flags.md. When all files are done, run scripts/visual-recapture.mjs and stop for human review. Work on a branch; never commit to main.Section 6
Step 3: the orchestration script
Claude generates the orchestration script when you ask for the workflow, but reviewing its shape before the run starts is worth ten minutes. The sketch below uses an agent() pseudo-API to show the loop, the reviewer gate, and the resumable progress file. It is a sketch of the structure, not a library to install.
import { readFile, writeFile } from "node:fs/promises"
import { glob } from "glob"
// Sketch: agent(prompt, options) dispatches a subagent and resolves with its result.
const map = await readFile("migration/tokens-map.json", "utf8")
const files = await glob("src/**/*.{tsx,ts,css}", { ignore: ["src/legacy/**"] })
const progress = JSON.parse(await readFile("migration/progress.json", "utf8").catch(() => "{}"))
const pending = files.filter((f) => progress[f] !== "done")
const flags = []
for (let i = 0; i < pending.length; i += 16) {
const batch = pending.slice(i, i + 16)
await Promise.all(
batch.map(async (file) => {
const result = await agent(
"Migrate this file using the token map. Change only mapped values.\nFile: " + file + "\nMap:\n" + map,
{ model: "sonnet" }
)
const review = await agent(
"Review the diff for " + file + ". Reject anything beyond mapped token replacements. Respond APPROVE or REJECT with reasons.\n" + result.diff,
{ model: "sonnet" }
)
if (review.startsWith("REJECT")) {
await agent("Redo the migration for " + file + " addressing: " + review, { model: "sonnet" })
}
flags.push(...result.flags)
progress[file] = "done"
await writeFile("migration/progress.json", JSON.stringify(progress, null, 2))
})
)
}
await writeFile("migration/flags.md", flags.join("\n"))Section 7
Step 4: the reviewer agent is the safety margin
The migration agent is optimized to make changes; the reviewer agent is optimized to refuse them. Keeping those two jobs in separate agents is what makes the loop trustworthy: the reviewer has no memory of why a change felt reasonable, only the diff and the rules.
The reviewer rejects diffs that touch unmapped values, resolve an ask silently, reformat code, or change logic. Rejected files get one retry with the reviewer notes attached; if they fail again, they land in the flags file for a human.
--- name: migration-reviewer description: Reviews a single-file diff from the token migration. Approves only diffs that replace mapped values exactly; rejects anything else. Read-only. tools: Read, Grep --- You review one diff from a design token migration. The only acceptable changes are replacements listed in migration/tokens-map.json. Reject the diff if it: - Changes a value that is not in the map - Resolves an ask entry without flagging it - Reformats, reorders, or renames anything - Changes component logic, props, or markup structure Respond with APPROVE, or REJECT followed by a numbered list of reasons specific enough for the migration agent to act on. Nothing else.
Migration agent edits file
Reviewer reads the diff
Approve or reject
Retry with reviewer notes
Second failure goes to flags.md
Mark file done
feeds next cycleEvery per-file diff passes through the reviewer; one rejection earns a retry with notes, a second sends the file to a human via the flags file.
Section 8
Step 5: recapture and compare before merging
A token migration is supposed to be visually invisible except where the new tokens intentionally differ. The only way to know is to look. The final stage of the workflow re-runs a Playwright capture script across key routes and viewports and compares against captures taken before the run started.
Expected differences, like a slightly different primary blue, are listed in the migration brief so the comparison agent can separate intended change from regression. Anything else is a finding that blocks the merge.
What every legacy value becomes, decided by humans before the run
Which files are done, making the run resumable
Whether each diff stayed inside the mapped changes
Unmapped values and ask cases that need a human decision
Whether the UI changed anywhere it was not supposed to
The human gate: nothing merges without review
Each artifact in the run answers a different trust question before the branch merges.
Section 9
Case study: hex codes to semantic tokens in a React and Tailwind app
A product team adopted a semantic token set built with Style Dictionary and exposed through the Tailwind theme, but the app still contained years of raw values. A pre-run audit counted 1,142 hard-coded color and spacing values across 327 files.
The mapping file took the design lead and a frontend engineer one afternoon: 64 distinct values, 9 of them marked ask because the same gray served several roles. The run dispatched 327 migration agents and 327 reviewer passes over roughly four hours; 21 diffs were rejected once and fixed on retry, and 38 usages landed in the flags file for human mapping.
The recapture across 14 routes and three viewports showed differences only where the new palette intentionally shifted, plus one real regression: a chart legend that had relied on an unmapped opacity value. It was fixed before the pull request merged, and the team kept the saved workflow to migrate two smaller internal apps the following month.
Section 10
Case study: consolidating two token sets after a merger
Two companies merged and brought two design systems with overlapping but differently named tokens: brand.primary in one, color-accent-500 in the other, with values that were close but not identical. Product teams were already mixing them in the same files.
The hard work was the mapping workshop, where the combined design team chose a winner for each pair and recorded the decision and its reasoning in the map notes. The run itself covered 412 files across both codebases in an afternoon, with the reviewer gate catching seven diffs where an agent had also tried to update neighboring spacing it considered inconsistent.
The flags file became the agenda for the second workshop: 26 token usages that did not fit either system cleanly. Three months later the legacy token packages were deleted, which the team described as the first visible proof internally that the two products were becoming one.
Section 11
Case study: dark mode that needed semantic tokens first
A team wanted dark mode, but the codebase used raw light-palette hex values everywhere, so theming had nothing to attach to. The migration was scoped as the prerequisite: replace literals with semantic tokens first, add the dark values to the token source second.
The first run migrated 289 files to semantic tokens with no visual change intended, verified by the recapture. The second, much smaller change added dark values in the Style Dictionary source and a theme toggle. Dark mode then appeared across the app without touching component files again.
The flags file did most of the design work: 47 usages where the light-mode value was fine but no sensible dark equivalent existed, mostly shadows, overlays, and illustration backgrounds. Those went to the design team as a worklist rather than surfacing as bugs after launch.
Section 12
Good vs bad migration output
The quality bar for this workflow is restraint. A good per-file change is boring: mapped values replaced, everything else untouched, judgment calls flagged instead of made. Spot-check early diffs against this table before letting the run continue.
Replaced colors and also cleaned up the component while in there
Replaced #1A73E8 with color.action.primary in 3 places; no other changes
Mapped the ambiguous gray to text.secondary everywhere to keep things consistent
Flagged 4 usages of #5F6368 marked ask: 3 look like borders, 1 is disabled text; left unchanged
Migrated 300 files, build passes, ready to merge
Migrated 300 files, 38 flags need decisions, recapture shows one unexpected diff on /reports
Good migration output is narrow, traceable to the map, and honest about uncertainty.
Section 13
What this workflow cannot decide
The workflow applies a mapping; it does not author one. Deciding that two near-identical blues become one token, or that 18px spacing rounds down rather than up, is design judgment that has to happen before the run and stays accountable to people.
It also cannot guarantee visual safety beyond the routes and states it recaptures, and it cannot verify non-web surfaces like emails or native views unless they are added to the capture set. The pull request review remains the final gate, and it should be a real one.
- It cannot make token naming and consolidation decisions; the map encodes decisions humans already made.
- It cannot prove zero visual change on routes and states that were never captured.
- It cannot resolve ask cases; flagged usages wait for a person.
- It should never run against main; branch, review, and merge like any other large change.
Section 14
The reusable migration workflow
Token migrations are rarely one-off. New brands, new platforms, and new themes keep arriving, and the saved workflow plus a fresh mapping file is most of the work the second time. Keep the steps below with the repo so the next run starts from a known shape.
1. Audit current usage and freeze the token map, including ask cases, with the design team. 2. Capture before screenshots of key routes and viewports on the current main. 3. Build the file list and start the workflow on a branch, with progress tracked for resumability. 4. Loop per-file migration agents that change only mapped values and flag everything else. 5. Run the reviewer agent on every diff; retry once on rejection, then flag for a human. 6. Collect flags and unmapped values into a decision list for the design team. 7. Recapture screenshots and compare against the before set, separating intended change from regression. 8. Open the pull request with the map, flags, and comparison attached; merge only after human review.
Sources

