Design-as-Code: Tokens, .pen, .op and the Diffable Design File

Section 01

The design file nobody else can read

Most product teams still keep their most important design decisions inside a file that only one application can open. The canvas tool renders it beautifully, but everything outside that tool sees an opaque blob. Git can store it, but a commit that touches it says only that a binary file changed. Code review cannot see what moved. CI cannot lint it. Six months later, nobody can answer the simple question: when did the primary color change, who changed it, and why?

The cost shows up as a translation tax. A designer adjusts a color, a spacing step, or a component state in the canvas. Someone then has to notice the change, export or re-inspect it, re-type values into CSS or a theme file, and hope nothing was missed. Every hop is a chance for drift. Most design-versus-implementation arguments are not disagreements about taste; they are two copies of the same decision quietly diverging.

Agents make this old problem urgent. A coding agent can read your repository, your CSS, your components, and your Markdown design system in seconds. It cannot read a proprietary binary design file at all, except through a vendor API or a plugin that returns a partial copy. If the source of truth for design lives in a format the agent cannot open, the agent will improvise the missing decisions, and you will spend review cycles correcting work that was never grounded in your actual system.

Section 02

Design-as-code: decisions as tokens, surfaces as text

Design-as-code is a simple inversion. Instead of treating the design tool's file as the original and everything else as an export, the design decisions and, increasingly, the design surfaces themselves live as text files in the same repository as the implementation.

Decisions live as design tokens: structured JSON in the Design Tokens Community Group (DTCG) format, transformed into CSS custom properties, platform constants, or theme files. Surfaces live as text too, in one of three families. The first family is token output: CSS variables and theme files, small and semantically dense. The second is text-based vector design files, such as Pencil's .pen format and OpenPencil's .op format, which store the canvas as a JSON tree of nodes, components, and variables. The third skips the design-file question entirely: HTML and component code as the design surface, where the artifact is already the implementation.

The property that makes all of this matter is diffability. A change to a text file produces a line-level diff. That single property unlocks everything teams already rely on for code: review before merge, history and blame, revert, branching, CI checks, and search. It also unlocks agent participation. An agent can read the contract before generating UI, make a change, and prove in a diff exactly what it touched. Reviewing an agent's design change stops being an act of faith and becomes a pull request.

This article owns the format and artifact layer: what the files look like, what the diffs look like, and what is honestly true about each format as of June 2026. For how to structure token taxonomies and turn them into agent instructions, audits, and prompts, read Design Tokens Are Agent Instructions first; this piece deliberately does not repeat it.

Tokens are the contract: named decisions in DTCG JSON, aliased from primitives to semantics.
Generated CSS variables are the runtime: built from the token source, never edited by hand.
.pen and .op are design surfaces as JSON: canvases an agent can open, edit, and commit.
HTML-as-canvas tools skip the format question: the artifact is already markup.
Diffability is the unlock: review, history, CI, and agent changes you can actually inspect.

Projects to inspect

Design Tokens Are Agent InstructionsThe companion article on token taxonomy, DESIGN.md guidance, token-aware prompting, and audit gates.

Section 03

What exists publicly, as of June 2026

The pieces of this workflow are no longer speculative. The token interchange layer stabilized recently, and two text-based design-file formats with agent access are usable today. The dates matter, so here is the verified state as of June 2026.

The DTCG design tokens specification reached its first stable version, 2025.10, in late October 2025, after years as an evolving draft. Newer draft revisions dated March and May 2026 continue to develop, so pin your expectations to 2025.10 and treat anything newer as in motion. Style Dictionary, the most widely used token build tool, is on version 5 and uses DTCG JSON as its base format; full coverage of the 2025.10 revision, including object-value dimensions and the expanded color spaces, is still being completed, so check the changelog before you depend on a specific feature. Tokens Studio, the token-management plugin and platform many design teams use, defaults to the DTCG token format and syncs token sets to git providers. Figma announced extended variable collections and native DTCG-conformant variable export and import at its Schema 2025 event; as of June 2026 that capability is described as rolling out, and community plugins such as TokensBrücke and GitFig cover the gap by pushing Figma variables into repositories today.

On the design-surface side, Pencil stores its documents as .pen files: a JSON object tree of frames, text, paths, components, and variables that lives in your repository, with an MCP server bundled into its VS Code and Cursor extension so agents can read and edit the canvas directly. Pencil is a proprietary tool that is free during early access as of June 2026. OpenPencil is an open-source counterpart under the MIT license; its .op files are human-readable JSON with typed variables, multi-axis themes, MCP access, and support for multiple agents working on the same document. It is a different project from the similarly named open-pencil repository that aims to read .fig files; do not conflate the two. HTML-as-canvas tools form the third family and get full treatment in their own articles, so they appear here only as a pointer.

The direction of travel is consistent: the design-tool ecosystem is converging on DTCG JSON as the interchange format, and even canvas-first vendors are adding git-shaped workflows. That convergence is what makes it reasonable to put the token file, not the canvas, at the center of your system.

Projects to inspect

DTCG Design Tokens Format ModuleThe token JSON specification: stable version 2025.10, with drafts continuing.Style Dictionary DTCG supportHow Style Dictionary v5 consumes DTCG JSON and what is still in progress.Pencil: the .pen formatDeveloper documentation for the JSON-based .pen design file.OpenPencilMIT-licensed AI-native vector design tool with .op JSON files and MCP access.

Diagram showing a DTCG token JSON source file feeding generated CSS variables, a .pen or .op design file, and agent-generated components, all flowing into a pull-request review gate, with dashed arrows showing the agent reading the token source before writing any surface. — diagramToken flow: one source, three surfaces, one gate

Section 04

Case study: one token set, one card, one diff

The fastest way to test the diffability argument is to run it. For this article I executed the token leg end to end in a small scratch repository that mirrors this site's own setup: the same oklch CSS-variable pattern this site uses in app/globals.css, the same semantic naming discipline, just reduced to a handful of tokens so every artifact fits in the article.

The run had four steps. First, define a small DTCG token file with primitive colors (warm paper, ink, school blue, lab green), semantic aliases (background, foreground, action.primary), and two dimension tokens (interactive radius, card padding). Second, sync that file into a generated tokens.css of CSS custom properties using a small Node script written by the agent from a one-paragraph prompt. Third, build a small enrolment-card UI fragment that consumes only var(--token) references: no hex values, no magic numbers. Fourth, change exactly one semantic token, regenerate, and look at what git says happened.

The honest accounting: the whole loop took three working iterations and roughly twenty minutes. The token file and the UI fragment each landed in one pass. The sync script did not. Its first version walked the token tree and emitted CSS, but it copied DTCG alias strings such as {color.base.paper} straight into the output. That is not valid CSS, so the browser silently dropped those declarations and the card rendered with no background and a default-styled button: a quiet failure, not a loud one. The fix in the second version was to resolve aliases into var() references to the primitive variables, which keeps the alias relationship visible in the generated CSS instead of flattening it away. That failure is worth remembering: alias resolution is exactly the kind of detail an agent will get subtly wrong if your prompt does not state what aliases should become in the output.

There was also a trade-off that does not show up in a happy-path demo. The generated names follow the token path, such as --color-action-primary, while this site's existing runtime layer uses shorter shadcn-style names such as --primary. Adopting a generated token layer in an existing codebase means either renaming your tokens to match what is already deployed, or maintaining a small mapping layer, or migrating the components. None of those options is free, and pretending otherwise is how token projects stall.

Section 05

The token source: a small DTCG file

This is the entire token source for the case study. It uses the DTCG 2025.10 shape: $type set at the group level, $value on each token, color values expressed as a color space plus components, dimensions as value-and-unit objects, and aliases written as dotted paths in curly braces.

Two design choices matter more than the syntax. Primitives carry no meaning and exist only to be aliased; semantics carry the product meaning and are the only names the UI is allowed to reference. And every token has a $description, because the description is what an agent reads when it has to decide whether a token applies to the surface it is building.

tokens/school.tokens.json (DTCG format)

{
  "color": {
    "$type": "color",
    "base": {
      "paper": {
        "$value": { "colorSpace": "oklch", "components": [0.97, 0.018, 86] },
        "$description": "Warm paper background. Primitive; alias through a semantic token."
      },
      "ink": {
        "$value": { "colorSpace": "oklch", "components": [0.16, 0.015, 260] },
        "$description": "High-contrast ink. Primitive."
      },
      "school-blue": {
        "$value": { "colorSpace": "oklch", "components": [0.43, 0.17, 263] },
        "$description": "Deep school blue. Primitive."
      },
      "lab-green": {
        "$value": { "colorSpace": "oklch", "components": [0.44, 0.08, 166] },
        "$description": "Lab green. Primitive."
      }
    },
    "background": {
      "$value": "{color.base.paper}",
      "$description": "Default page background."
    },
    "foreground": {
      "$value": "{color.base.ink}",
      "$description": "Default text color."
    },
    "action": {
      "primary": {
        "$value": "{color.base.school-blue}",
        "$description": "Primary actions, active navigation, emphasis."
      },
      "primary-foreground": {
        "$value": { "colorSpace": "oklch", "components": [0.99, 0.006, 86] },
        "$description": "Text on primary actions."
      }
    }
  },
  "dimension": {
    "$type": "dimension",
    "radius": {
      "interactive": {
        "$value": { "value": 8, "unit": "px" },
        "$description": "Buttons, inputs, cards."
      }
    },
    "space": {
      "card-padding": {
        "$value": { "value": 24, "unit": "px" },
        "$description": "Internal padding for cards and panels."
      }
    }
  }
}

Section 06

The sync prompt and the generated CSS

The sync step is where most teams reach for Style Dictionary, and in production you probably should: it is mature, extensible, and already speaks DTCG. For a case study this small, a dedicated script makes the mechanics visible, and writing it through an agent is itself a useful exercise in being precise about what the output must look like.

This is the prompt that produced the script, after the first iteration's alias bug was found. The added sentence about aliases is the entire difference between version one and version two. Everything else the agent inferred correctly from the token file and the target example.

Sync prompt given to the agent

Write scripts/sync-tokens.mjs.

Input: tokens/school.tokens.json (DTCG format: $type on groups, $value on tokens).
Output: ui/tokens.css, a single :root block of CSS custom properties.

Rules:
- Token names become kebab-case custom properties from the token path,
  e.g. color.action.primary -> --color-action-primary.
- oklch color values become oklch(L C H) strings.
- Dimension values become "<value><unit>".
- Aliases like {color.base.paper} must be emitted as var(--color-base-paper),
  not copied literally and not flattened to the raw value. The alias
  relationship has to stay visible in the generated CSS.
- Start the file with a comment marking it as generated. Never edit it by hand.
- Print the file path when done. No dependencies beyond node:fs.

Section 07

The runtime layer: generated CSS variables

The generated output is deliberately boring, and that is the point. It is the same pattern this site uses in app/globals.css, where Tailwind v4 theme variables map onto :root custom properties holding oklch values: a thin runtime layer that components consume through var() references and that the build regenerates whenever the source changes.

Notice that the semantic variables reference the primitive variables instead of repeating the raw color. That preserves the decision structure at runtime: when you read --color-action-primary in devtools or in a diff, you can see which primitive it points at, not just a number you have to reverse-engineer.

ui/tokens.css (generated)

/* generated from tokens/school.tokens.json -- do not edit by hand */
:root {
  --color-base-paper: oklch(0.97 0.018 86);
  --color-base-ink: oklch(0.16 0.015 260);
  --color-base-school-blue: oklch(0.43 0.17 263);
  --color-base-lab-green: oklch(0.44 0.08 166);
  --color-background: var(--color-base-paper);
  --color-foreground: var(--color-base-ink);
  --color-action-primary: var(--color-base-school-blue);
  --color-action-primary-foreground: oklch(0.99 0.006 86);
  --dimension-radius-interactive: 8px;
  --dimension-space-card-padding: 24px;
}

Section 08

Change one token, read the diff

With the source committed, the experiment is a single edit: the semantic action.primary token moves from the school-blue primitive to the lab-green primitive, and the sync script runs again. This is the entire diff git produced.

Read what it says, and what it does not say. Two files changed, one meaningful line in each. The token source records the decision: primary actions are now lab green. The generated CSS records the consequence. The UI fragment, the enrolment card built earlier, does not appear in the diff at all, because it only ever referenced var(--color-action-primary). Its kicker text and button picked up the new color at render time without a single component edit. That is the contract working: decisions change in one place, surfaces follow, and the review shows exactly the blast radius.

Compare that with the same change in a hand-off workflow: a style edit inside a binary file, a notification, a re-export, and a manual edit to CSS that may or may not catch every usage. Or with the failure mode design-as-code prevents: a teammate or an agent hot-fixing the new green directly inside three components, leaving the old blue live in two others. The diff is small precisely because the structure is right; a noisy diff is usually a symptom of decisions living in the wrong layer.

git diff after the one-token change

diff --git a/tokens/school.tokens.json b/tokens/school.tokens.json
--- a/tokens/school.tokens.json
+++ b/tokens/school.tokens.json
@@ -30,5 +30,5 @@
     "action": {
       "primary": {
-        "$value": "{color.base.school-blue}",
+        "$value": "{color.base.lab-green}",
         "$description": "Primary actions, active navigation, emphasis."
       },

diff --git a/ui/tokens.css b/ui/tokens.css
--- a/ui/tokens.css
+++ b/ui/tokens.css
@@ -7,5 +7,5 @@
   --color-background: var(--color-base-paper);
   --color-foreground: var(--color-base-ink);
-  --color-action-primary: var(--color-base-school-blue);
+  --color-action-primary: var(--color-base-lab-green);
   --color-action-primary-foreground: oklch(0.99 0.006 86);
   --dimension-radius-interactive: 8px;

 ui/enrol-card.html: no changes

Section 09

.pen and .op: the design surface as a text file

Tokens cover decisions. The newer development is that the design surface itself, the canvas with frames, vectors, text, and components, can also be a text file in your repository. Two formats anchor that idea today: Pencil's .pen and OpenPencil's .op. To be clear about evidence: the token-and-CSS case study above was executed for real; the excerpt below was not produced by either tool in this run, because both require their own editors. It is an illustrative reconstruction of the documented structure, kept deliberately small, and it should be read as a description of the format families rather than a verbatim file.

Both formats share the same shape: a JSON document containing a tree of nodes (frames, text, shapes), reusable components, and named variables for colors, dimensions, and text styles. Pencil documents live in the codebase and are edited inside VS Code or Cursor, with a bundled MCP server so an agent can read the document, add or modify nodes, and bind properties to variables; the format is documented for developers, and the tool is free during early access as of June 2026. OpenPencil's .op files are described by the project as human-readable and git-friendly JSON, with typed variables, multi-axis themes, MCP access, code export to several frameworks, and support for multiple agents editing the same document; the project is MIT-licensed.

The connection to tokens is the variables block. If the .pen or .op variables carry the same names and values as your DTCG source, the design canvas and the runtime CSS are two projections of one decision set, and a token change can propagate to both through the same kind of sync step. That is the full version of the diagram earlier in this article: one source, multiple surfaces, one review gate.

The honest caveat is diff noise. A token or variable change in a .pen or .op file is a clean, reviewable diff, just like the CSS one above. A layout change that moves forty nodes is technically a text diff but practically unreadable; you review it the way you review a large generated file, by trusting the tool's render and the screenshot, not by reading every line. Diffable does not mean every diff is pleasant. The claim worth making is narrower and still valuable: the decisions you care about reviewing, variables, component definitions, bindings, produce diffs a human can actually read.

Illustrative .op-style document excerpt (reconstructed, not tool output)

{
  "name": "enrol-card",
  "variables": {
    "color/action/primary": { "type": "color", "value": "oklch(0.43 0.17 263)" },
    "radius/interactive": { "type": "dimension", "value": "8px" }
  },
  "nodes": [
    {
      "type": "frame",
      "name": "EnrolCard",
      "fill": { "variable": "color/background" },
      "cornerRadius": { "variable": "radius/interactive" },
      "children": [
        { "type": "text", "name": "Kicker", "content": "Next cohort",
          "fill": { "variable": "color/action/primary" } },
        { "type": "frame", "name": "ReserveButton",
          "fill": { "variable": "color/action/primary" },
          "cornerRadius": { "variable": "radius/interactive" } }
      ]
    }
  ]
}

Section 10

Old world vs design-as-code, kept honest

Comparison boards about new workflows tend to flatter the new thing. This one tries not to. The left column is the hand-off workflow built around a binary design file and a vendor workspace; the right is the design-as-code workflow built around text artifacts in the repository. The contrast is real, but it is about specific properties, not about one side being obsolete.

What the binary-file workflow genuinely lacks is external diffability and direct agent access: outside the design tool, a change is opaque, and an agent can only work on a copy of the decision fetched through an API or plugin. What the design-as-code workflow genuinely lacks is everything Figma has spent a decade building: real-time multiplayer for non-technical collaborators, mature component and prototyping tooling, an enormous plugin ecosystem, and the place where stakeholders already are. Figma also offers in-tool branching and version history, and has announced DTCG-conformant variable export, so the gap is narrowing from their side too. The fair statement, as of June 2026, is that text-based formats win on review, history, CI, and agent participation, and the canvas-first tools win on collaboration breadth and editing ergonomics. Most teams will run both for a while, with tokens as the bridge.

Comparison board contrasting the hand-off workflow built on a binary design file with the design-as-code workflow built on text artifacts, across file format, location, the cost of a primary color change, review and history, and agent participation, with a footer noting Figma's continuing strengths in multiplayer collaboration and component tooling. — diagramThe same design change, two workflows

Section 11

Good diffs and bad diffs

Once design artifacts are text, the diff becomes a design-quality signal in its own right. A good diff is small, lives in the layer where the decision belongs, and reads like a sentence: action.primary now points at lab-green. A bad diff is large, touches the consuming surfaces instead of the source, or changes generated files without changing their source.

Three smells are worth training yourself and your agents to notice. First, semantic changes appearing only in components: if a rebrand shows up as thirty edited TSX files and no token change, the decision was implemented as sprawl, and the next change will cost just as much. Second, generated files drifting from their source: a hand-edit to tokens.css that is not reflected in the token JSON will be silently overwritten by the next sync, which is why the generated header comment and a CI freshness check both matter. Third, giant node-tree diffs presented for line-by-line review: for .pen, .op, or any generated canvas file, the reviewable unit is the variable and component definitions plus a rendered screenshot, and pretending otherwise just trains reviewers to approve without reading.

Good: a one-line change to a semantic token, plus the regenerated output, plus a screenshot.
Good: a new component token introduced in the source before the component that uses it.
Bad: raw hex values appearing in component code in the same PR that claims to follow the system.
Bad: edits to generated CSS with no corresponding token change.
Bad: a 2,000-line canvas-file diff with no rendered before/after attached.

Section 12

Templates and prompts you can reuse

Two artifacts make this workflow stick: a prompt fragment that forces the agent to read the token contract before writing any surface, and a pull-request template that makes token changes reviewable by people who were not in the room. The token-audit prompt and verification script live in the companion tokens article; these two are specific to the design-as-code loop.

The CI idea is intentionally minimal: fail the build if component code contains raw hex or arbitrary spacing values, and fail it if regenerating the token output produces a file that differs from the committed one. The first check stops sprawl; the second stops generated files from drifting away from their source. Both are a few lines of script, and both give the agent a hard signal it will respect more reliably than prose.

Read-tokens-first prompt fragment + token-change PR template

## Prompt fragment: read the contract before writing UI

Before creating or editing any UI:
1. Read tokens/*.tokens.json and the generated tokens.css.
2. Use var(--token) references only. No raw hex, rgb, oklch, px radii,
   or spacing values in component code.
3. If a needed decision has no token, stop and propose the token first
   (name, type, value, description) instead of hardcoding it.
4. After changes, run the token sync and the no-raw-values check, and
   include the diff of any token files you touched in your report.

## PR template: token change

Decision: <which semantic token changed and why>
Source diff: <token JSON lines>
Regenerated: <generated files rebuilt, yes/no>
Surfaces affected: <where the change is visible; attach before/after screenshots>
Not affected: <what intentionally did not change>
Checks: sync clean / no raw values / contrast reviewed for affected states
Risks for human review: <anything needing design judgment>

Section 13

Risks, limits, and anti-patterns

Design-as-code is a structural improvement, not a guarantee of good design. The clearest limits are worth stating plainly, because the failure modes are quieter than the demos.

Tokens do not make decisions; they record them. A perfectly synced token pipeline will faithfully propagate a low-contrast color to every surface at once. Contrast, hierarchy, and state coverage still need checks and human judgment, and nothing in this article should be read as an accessibility claim about your output. The tooling is also young at the edges: the DTCG spec is stable at 2025.10 but still evolving, Style Dictionary's coverage of the newest revision is still completing, Pencil is in early access, and OpenPencil is a young open-source project. Build the workflow so that any single tool can be replaced: the token JSON and the generated CSS are the durable assets, and they outlive whichever editor or build step you choose this year.

There are also organizational anti-patterns. Migrating every existing component to a new variable naming scheme in one heroic PR usually stalls; the naming-mismatch trade-off from the case study is better handled by mapping or by migrating surface by surface. Treating the design tool as dead is another: if your designers, PMs, and stakeholders live in Figma, ripping it out to satisfy an architecture diagram trades a real collaboration surface for a theoretical one. And handing an agent write access to the token source without a review gate inverts the entire benefit; the point of diffable artifacts is that changes are reviewed, not that they are automatic.

Do not claim accessibility from token consistency; check contrast and states on the affected surfaces.
Do not hand-edit generated files; fix the source and re-run the sync.
Do not big-bang rename existing CSS variables to match a new token scheme; map or migrate incrementally.
Do not review giant canvas-file diffs line by line; review variables, components, and rendered output.
Do not let agents commit token changes without the same PR gate humans use.
Do not anchor the workflow to one vendor's format; keep DTCG JSON as the durable layer.

Section 14

A same-day workflow

You do not need a platform team or a migration project to start. The case study above is small enough to reproduce in an afternoon inside an existing project, and the order matters less than keeping each step reviewable.

Start with the decisions you change most often, usually brand colors, status colors, radius, and a couple of spacing steps. Resist the urge to tokenize everything on day one; a small contract that is actually enforced beats a complete one that is ignored.

Pick the source of truth: a DTCG-format tokens file committed to the repository.
Name semantic tokens for the decisions that change, aliased from a small primitive set.
Generate the runtime layer: CSS variables built by Style Dictionary or a small sync script, marked as generated.
Require var(--token) references in components, and add the no-raw-values and sync-freshness checks to CI.
Point your agent at the contract: the read-tokens-first prompt fragment plus DESIGN.md guidance.
Route every token change through a pull request with the template above and before/after screenshots.
If you adopt a .pen or .op canvas, bind its variables to the same token names so the canvas and the runtime stay two projections of one source.

Sources

Sources & further reading

DTCG Design Tokens Format Module (drafts and stable 2025.10)
The token JSON specification this article's token file follows.
W3C DTCG: specification reaches first stable version
The October 2025 announcement of the 2025.10 stable version.
Style Dictionary: DTCG support
Style Dictionary v5 documentation on DTCG JSON as its base format.
Tokens Studio: token format
Tokens Studio documentation on the DTCG (W3C) token format default.
Figma forum: native variable export
Discussion of Figma's announced DTCG-conformant variable export and current workarounds.
Pencil: the .pen format
Developer documentation for Pencil's JSON-based design file.
OpenPencil
MIT-licensed open-source vector design tool with .op JSON files, typed variables, and MCP access.
Tailwind CSS theme variables
The runtime CSS-variable token layer used by this site's own globals.css.
GitFig
A plugin that syncs Figma variables with GitHub, retrofitting diffability onto the canvas workflow.

Design-as-Code: Tokens, .pen, .op and the Diffable Design File

The design file nobody else can read

Design-as-code: decisions as tokens, surfaces as text

What exists publicly, as of June 2026

Case study: one token set, one card, one diff

The token source: a small DTCG file

The sync prompt and the generated CSS

The runtime layer: generated CSS variables

Change one token, read the diff

.pen and .op: the design surface as a text file

Old world vs design-as-code, kept honest

Good diffs and bad diffs

Templates and prompts you can reuse

Risks, limits, and anti-patterns

A same-day workflow

Sources & further reading

Keep reading on Design systems.

Build a Design Harness Before You Prompt

Design Tokens Are Agent Instructions

Design System Audits With Agents

Get the next design-as-code template, token sync prompt, and review-gate checklist by email.

For deeper reading, explore the books behind the Agentic Design School curriculum.

The Agentic Designer

Claude Code for Designers

Open Design