Agentic Design School

Section 01

Why localization risk is cheapest to find before translation starts

Localization problems are rarely discovered by the people who created them. They surface months later, when a German translator asks why a sentence is split across three strings, when the Japanese release truncates every button, or when the Arabic build mirrors the layout but not the icons. By then the fixes compete with a launch date, and the team pays late-stage prices for early-stage mistakes.

Most of these problems are visible long before a single word is translated. Hard-coded strings can be found in code. Concatenated sentences can be found in how strings are assembled. Layouts that cannot absorb 30 to 40 percent more text — the typical expansion from English to German or Finnish — can be found by re-rendering screens with pseudo-localized strings. Date, number, and currency assumptions can be found by reading the formatting code. None of this requires a translator; it requires patient inspection at a scale humans rarely have time for.

That is the job of this workflow: a structured readiness review that surface agents run across screens and code, an expansion agent confirms visually with pseudo-localized screenshots, and a ranking pass turns into a prioritized fix list. It flags risk; it does not make language calls. Translators and in-market reviewers own the words, the tone, and the cultural fit.

Section 02

When to reach for this workflow

Run it when a localization push is planned but not yet started: a new market on the roadmap, a contract that requires specific languages, or a product that has been English-only long enough for assumptions to harden. Run it again per product area as new surfaces are built, because internationalization debt accrues exactly like accessibility debt — one component at a time.

It is an Intermediate workflow because the inspection is broad and the evidence is mixed: code reading, string catalog analysis, and visual confirmation with pseudo-localized rendering all feed the same findings list.

A first localization push into one or more new locales with a date attached.
An expansion into a right-to-left locale, where layout assumptions carry the most risk.
A product area with a history of truncation and overflow bugs in non-English builds.
A design system team deciding which components need expansion-safe variants.

Section 03

What the review looks for

The review covers eight risk classes, and each class has a different best source of evidence. String handling and formatting risks live in code; expansion and truncation risks live in rendered layouts; iconography and color-meaning risks live in the design files and need human cultural review to settle.

tableRisk classes and where the evidence lives

1Hard-coded strings

User-facing text outside the string catalog — found by code search, fixed by extraction

2Concatenation

Sentences assembled from fragments or with embedded variables that break under different word orders — found in code, fixed with ICU MessageFormat

3Text expansion

Layouts that clip or wrap badly with 30–40% longer strings — found by pseudo-localized rendering

4Truncation

Fixed widths, single-line labels, and CSS truncation on translatable text — found in code and confirmed visually

5Dates, numbers, currency

Manual formatting instead of locale-aware APIs such as Intl — found in code

6RTL readiness

Physical CSS properties, direction-dependent icons, unmirrored layouts — found in code and design files

7Iconography and color meaning

Symbols and colors whose meaning shifts across cultures — flagged for in-market review, not decided by the agent

8Locale-specific legal lines

Consent, pricing, and tax copy that must change per market — flagged for legal and local review

Each risk class maps to a check the workflow can run and a decision a human still owns.

Section 04

The orchestration pattern: fan-out per surface and locale risk

The review runs as a Claude Code dynamic workflow. Including the word workflow in the prompt, or running with /effort ultracode, has Claude write a JavaScript orchestration script that runs in the background, holding the screen inventory, the string catalog, and per-surface findings in its own variables rather than in Claude's context. It dispatches one subagent per surface — checkout, settings, onboarding, and so on — plus dedicated agents for cross-cutting risks: formatting, RTL, and the pseudo-localization rendering pass.

Up to sixteen subagents run concurrently and a run can dispatch up to a thousand, so a product area with a dozen surfaces and three cross-cutting checks fits comfortably in one resumable run. When the review settles into a shape the team trusts, save it to .claude/workflows/ as /l10n-readiness (or to ~/.claude/workflows/ for a personal copy) and rerun it per product area or per release. Subagent definitions — the surface inspector, the expansion renderer — live in .claude/agents/ as markdown files so each run holds every surface to the same standard.

The expansion agent is the only one that touches a browser. It swaps the string catalog for a pseudo-localized version, re-renders the key screens with Playwright, captures screenshots, and reports which layouts clipped, overflowed, or pushed controls out of reach. Everything else is reading.

diagramLocalization readiness review run

Design decision

Inventory surfaces and string catalog

Design decision

Surface agents inspect screens and code

Design decision

Formatting and RTL agents check cross-cutting risks

Design decision

Expansion agent re-renders with pseudo-localized strings

Design decision

Merge and rank findings by impact and cost-to-fix-late

Design decision

Human review and locale sign-off plan

Surface agents and cross-cutting agents inspect in parallel; the expansion agent confirms visually; findings merge into one ranked list.

Section 05

Step 1: pseudo-localize before you translate

Pseudo-localization replaces every string in the catalog with a longer, accented, bracketed version — Settings becomes something like [!!! Šéttïñğš çõñfîğûråtîöñ !!!] — so expansion and clipping problems become visible without waiting for real translations. It also exposes hard-coded strings by omission: any text that still reads as plain English after the swap never went through the catalog.

The script below is deliberately small. It pads strings by roughly 35 percent, wraps them in markers so truncation is obvious in screenshots, and preserves placeholders so the app still runs.

scripts/pseudo-localize.mjs (excerpt)

import { readFile, writeFile } from "node:fs/promises"

const map = { a: "å", e: "é", i: "ï", o: "ö", u: "û", n: "ñ", s: "š", c: "ç" }
const PAD = 0.35

function pseudo(value) {
  // Preserve ICU/i18n placeholders like {count} or {name, plural, ...}
  const parts = value.split(/(\{[^}]*\})/g)
  const transformed = parts
    .map((part) => (part.startsWith("{") ? part : part.replace(/[a-z]/g, (ch) => map[ch] || ch)))
    .join("")
  const padding = "~".repeat(Math.ceil(value.length * PAD))
  return "[!! " + transformed + padding + " !!]"
}

const catalog = JSON.parse(await readFile("locales/en.json", "utf8"))
const out = Object.fromEntries(Object.entries(catalog).map(([key, value]) => [key, pseudo(value)]))

await writeFile("locales/en-XA.json", JSON.stringify(out, null, 2))
console.log("Wrote locales/en-XA.json with " + Object.keys(out).length + " pseudo-localized strings")

Section 06

Step 2: the workflow prompt

The prompt names the surfaces in scope, the string catalog, the pseudo-locale, and the screens the expansion agent should render. It also states the boundary: the run reports risk and proposed fixes; it does not rewrite copy and it does not decide what is culturally appropriate in any market.

Localization readiness workflow prompt

Run this as a workflow.

Review the checkout, account, and onboarding areas for localization readiness ahead of the German, Japanese, and Arabic launches.

Inputs:
- Source under src/, string catalog at locales/en.json, pseudo-locale at locales/en-XA.json (from scripts/pseudo-localize.mjs)
- Key screens listed in l10n/screens.json with their routes and states
- Locale risk notes in l10n/locale-notes.md (expansion factors, RTL, legal lines per market)

Dispatch one agent per surface to find: hard-coded strings, concatenated or fragment-assembled sentences, embedded variables without ICU MessageFormat, fixed widths and single-line truncation on translatable text, and any text rendered inside images.
Dispatch a formatting agent to find manual date, number, and currency formatting that should use locale-aware APIs.
Dispatch an RTL agent to find physical CSS properties, direction-dependent icons, and layouts that will not mirror.
Dispatch an expansion agent to start the app with the en-XA locale, render each screen in l10n/screens.json with Playwright at 1440 and 390 px, save screenshots to l10n/output/pseudo/, and report every layout that clips, overflows, or pushes a control out of view.

Each finding needs: surface, file or screen, risk class, affected locales, user impact, estimated cost to fix after translation has started, and a concrete fix. Merge into l10n/findings.md ranked by impact and cost-to-fix-late.

Flag iconography, color meaning, and legal copy questions for in-market review — do not resolve them. Do not change any code or copy in this run.

Section 07

Step 3: the surface inspector subagent

Defining the surface inspector once keeps every surface held to the same checklist, in this run and in the rerun after fixes. It is read-only; fixes happen later, under review, in passes.

.claude/agents/l10n-surface-inspector.md

---
name: l10n-surface-inspector
description: Inspects one product surface for internationalization risk - string handling, truncation, expansion fragility, and formatting assumptions. Read-only.
tools: Read, Grep, Glob
---

You inspect one surface (a set of routes and their components) for localization readiness.

Check for:
- User-facing strings not in the catalog (hard-coded JSX text, aria-labels, alt text, validation messages, email templates)
- Sentences built by concatenating fragments or interpolating variables without ICU MessageFormat plural and select handling
- Fixed widths, nowrap, single-line truncation, or character-count assumptions on translatable text
- Text baked into images or SVGs
- Manual date, number, or currency formatting instead of Intl or the project i18n library
- Physical CSS properties (left/right margins, paddings, positions) on layout that should mirror in RTL

For each finding report: file and line, risk class, affected locales, user impact, cost to fix late (low / medium / high), and a concrete fix.
Do not rewrite copy. Do not assess translation quality or cultural appropriateness; flag those for in-market review. Do not modify files. Return findings as a JSON array and nothing else.

Section 08

Step 4: rank by impact and cost to fix late

Localization findings are ranked on two axes: how badly a real user in the target locale is affected, and how much more expensive the fix becomes once translation has started. A hard-coded string is cheap to extract today and expensive to chase after 40,000 words have been sent to translators; a clipped tooltip in a rarely used settings panel can wait.

The ranked list is where humans take over. The localization lead, the design lead, and engineering walk the high-impact, high-late-cost findings and decide what blocks the translation kickoff, what ships as a known issue, and what changes the design instead of the code — sometimes the right fix for an expansion problem is a shorter label or a different layout, and that is a design call.

Block translation kickoff: hard-coded strings, concatenation, missing plural handling — anything that changes the string catalog.
Fix before launch: expansion breakage on core flows, formatting of money and dates, RTL mirroring on navigation.
Schedule: truncation on secondary surfaces, icon and illustration reviews per market.
Decide with humans: tone, cultural fit, legal copy, and whether a layout should change instead of a string.

diagramFindings triage flow

Design decision

Collect merged findings

Design decision

Score user impact

Design decision

Score cost to fix late

Design decision

Mark kickoff blockers

Design decision

Schedule pre-launch fixes

Design decision

Route human-decision items

Findings are scored on user impact and cost-to-fix-late, then routed into the buckets the team walks together.

Section 09

Case study: 38 layouts broke under pseudo-localization

A B2B SaaS product preparing German and Japanese launches ran the review across its four core areas. The surface agents and the expansion agent finished in just over two hours, most of it the rendering pass across 61 screens at two widths.

The expansion agent reported 38 layouts that broke under the pseudo-locale: clipped buttons in the settings sidebar, a pricing table whose column headers wrapped onto the data rows, and a stepper component whose labels were truncated to uselessness at 390 pixels. Twenty-nine of the 38 traced back to four shared components, which turned a frightening number into a focused fix: the design system team shipped expansion-safe variants of those four components and the count dropped to six on the rerun.

The surface agents separately found 214 hard-coded strings — about half in validation messages and empty states that had never been designed, only written inline — and 19 concatenated sentences that would not survive German word order. All of those were fixed before the catalog went to translators, which the localization vendor later estimated saved a round of rework across both languages.

Section 10

Case study: hard-coded currency in an ecommerce checkout

An ecommerce team expanding from the US into three European markets ran the review on checkout only. The formatting agent found prices assembled by string concatenation — a dollar sign glued to a number formatted with toFixed(2) — in eleven places, including the order confirmation email, plus a tax line that assumed sales tax was added at display time rather than included in the price, which is wrong for VAT markets.

The fix replaced the manual formatting with Intl.NumberFormat keyed by locale and currency, and moved the tax presentation decision into the pricing service where it belonged. The findings list also flagged the legal lines — withdrawal rights and terms copy that differ by market — for the legal team rather than proposing wording, which is exactly where the workflow's job ends.

The team's own estimate was that catching the tax-display assumption before launch avoided a pricing-display defect in three markets at once; it had passed every existing test because every existing test assumed US conventions.

Section 11

Case study: RTL readiness before an Arabic launch

A productivity app preparing an Arabic launch ran the review with the RTL agent as the priority. The agent found roughly 160 physical CSS properties that should have been logical properties, but more usefully it grouped them: two-thirds were in three legacy layout components, and the rest were scattered one-offs that a codemod could handle.

The harder findings were not CSS. Direction-dependent icons — back arrows, progress chevrons, a timeline that read left to right — were flagged for design review, and the expansion agent's RTL render showed the main navigation mirrored correctly while the onboarding illustrations did not. The team's in-market reviewer made the calls on which icons needed mirroring and which were conventionally left as-is, a distinction no script and no agent should make alone.

The launch shipped with a known-issues list of four P3 items, all documented from the findings file, and the saved /l10n-readiness command became part of the team's definition of done for new surfaces.

Section 12

Good vs bad findings

A localization finding is useful when a developer can fix it without re-deriving the analysis and a designer can see who is affected and how badly. Vague findings about text being too long help nobody.

tableFinding quality comparison

1Bad

Some strings are not translated

2Good

Hard-coded string: validation message in CheckoutForm.tsx line 142 is inline JSX, not in locales/en.json — affects all locales, high cost to fix after translation kickoff

3Bad

German text might not fit

4Good

Expansion: PlanCard title clips at 1 line under en-XA at 390px (l10n/output/pseudo/plans-390.png) — affects de and fi, fix is a 2-line clamp in the shared card component

5Bad

The app does not support RTL

6Good

RTL: SidebarNav uses margin-left and a hard-coded chevron-right icon — blocks Arabic; switch to margin-inline-start and a direction-aware icon token

A fix-ready finding names the file, the risk class, the affected locales, and the late cost.

Section 13

What this review cannot prove

A clean readiness review does not mean the product is localized; it means the obvious structural traps have been found before they get expensive. Translation quality, tone, cultural appropriateness of imagery and color, legal compliance per market, and how the product actually feels to someone working in Arabic or Japanese are human judgments made by translators, in-market reviewers, and local users.

Pseudo-localization is also a model, not a language. It approximates expansion and exposes catalog gaps, but real German compounds, real Japanese line-breaking, and real Arabic shaping behave differently, so the pre-launch pass with real translations still matters.

It cannot judge translation quality or tone; that belongs to translators and reviewers.
It cannot decide cultural fit of icons, color, or imagery; flag and hand over.
It cannot certify legal copy per market.
It cannot replace testing with real translated builds and in-market users before launch.

Section 14

The reusable readiness workflow

Save the run to .claude/workflows/ once the findings format works for your team, keep the pseudo-localization script and the screen inventory in the repo, and rerun the command per product area and again after the fix passes. Readiness becomes a repeatable check instead of a pre-launch panic.

Localization readiness workflow

1. Inventory the surfaces, routes, and key screens in scope; list them in l10n/screens.json.
2. Generate the pseudo-locale from the string catalog with scripts/pseudo-localize.mjs.
3. Fan out one read-only inspector agent per surface for strings, concatenation, truncation, and embedded text.
4. Run the formatting and RTL agents across the same scope.
5. Run the expansion agent: render key screens under the pseudo-locale, capture screenshots, report breakage.
6. Merge findings ranked by user impact and cost to fix late; separate human-decision items.
7. Review with localization, design, and engineering: what blocks kickoff, what ships, what changes design.
8. Fix in passes, rerun the workflow, and hand the product to translators and in-market reviewers.

Sources

Localization Readiness Review

Why localization risk is cheapest to find before translation starts

When to reach for this workflow

What the review looks for

The orchestration pattern: fan-out per surface and locale risk

Step 1: pseudo-localize before you translate

Step 2: the workflow prompt

Step 3: the surface inspector subagent

Step 4: rank by impact and cost to fix late

Case study: 38 layouts broke under pseudo-localization

Case study: hard-coded currency in an ecommerce checkout

Case study: RTL readiness before an Arabic launch

Good vs bad findings

What this review cannot prove

The reusable readiness workflow

Sources & further reading

Get the next localization readiness checklist and pseudo-localization script by email.

For deeper reading, explore the books behind the Agentic Design School curriculum.

The Agentic Designer

Claude Code for Designers

Open Design