Section 1
The most-asked question is the wrong first question
Which agent should I use? is the question designers ask first, and it is the question with the shortest shelf life. Every fact that feeds the answer — versions, features, pricing, even whether a tool will exist next month — moves faster than any article can track. The comparison this site's source material relied on was verified in May 2026; by the first of June, four of its claims were already wrong. One of the four platforms in this article's title stops serving individual users seventeen days after the publication date.
That is not a reason to skip the comparison. Teams still have to pick something, justify the pick to whoever pays for it, and live with the consequences. It is a reason to change what the comparison is for. The honest version of this article does three things: it tells you what is actually different between Claude Code, Codex CLI, OpenCode, and Gemini CLI as of a stated date; it gives you a decision framework built on the differences that change slowly; and it shows why the decision matters less than the thing you bring to it — a portable design harness that moves between platforms in minutes.
A note on method before anything else. Every volatile fact in this article — pricing, version numbers, context windows, the Gemini CLI transition — carries a last-verified date of June 1, 2026, and should be re-checked against the linked primary source before you rely on it. Stable facts — file names, open standards, the structure of the decision — are stated plainly. The difference between those two categories is itself one of the most useful things this article can teach.
Section 2
Platform versus harness: what actually ports
The biggest shift in this space during 2025 and 2026 was convergence on open standards. All four platforms now read a project instruction file: AGENTS.md natively in Codex CLI and OpenCode, via a one-line import in Claude Code's CLAUDE.md, and via a setting in Gemini CLI. All four support Agent Skills — the SKILL.md folder format — and three of them will discover skills from the shared .agents/skills/ location. All four speak MCP, so the same Playwright, Figma, or canvas server works everywhere once you translate the config syntax. The mechanics of those files are covered in the instruction-hierarchy article and the harness article on this site; this one assumes them rather than re-explaining them.
That convergence has a practical consequence that vendors do not advertise: the durable asset is not the platform, it is the harness. Your DESIGN.md, your tokens, your AGENTS.md, your skills, your example screenshots, and your review gates are plain files in your repository. They are the part that took weeks to get right, and they are the part that moves. What does not move is operational: permission and approval settings, sandbox flags, plans and usage windows, model selection, and platform-only conveniences like Claude Code's path-scoped rules or agent teams. Rebuilding that layer when you switch platforms is roughly an hour of work, not a migration project.
Hold on to that asymmetry, because it reframes the whole decision. If switching costs an hour and the harness carries the design knowledge, then choosing a platform is closer to choosing a text editor than choosing a design tool. You can be wrong, notice, and move — provided you did not let platform-specific config become the only place your design knowledge lives.
Projects to inspect
The harness on the left is the asset. Each platform needs only an adapter — an import line, a folder mapping, or a config translation — while permissions, pricing, and model selection have to be rebuilt per platform.
Section 3
The criteria that actually matter for design work
Feature checklists stopped being useful in 2026 because everyone checks every box. Instruction file: yes, four times. Skills: yes. MCP: yes. Subagents: yes. Image input: yes, with caveats. If you compare platforms by counting checkmarks you will conclude they are interchangeable, which is half true and entirely unhelpful. The differences that change a designer's week live one level down, in how each capability is operated and governed.
Five criteria carry most of the weight for design-to-code work. First, sandboxing and permission ergonomics: how the platform contains a mistake, and how often it interrupts you to ask. A screenshot-verify loop that needs approval for every browser action is unusable; an agent with unrestricted shell access on a client project is a different kind of unusable. Second, model access and context: which models you can use, whether you can swap them, and how much of a design system fits in one conversation. Third, the pricing model — not the headline number, but the shape: subscription usage windows, token credits, or bring-your-own-key. Fourth, design-specific surfaces: image input quality, browser and screenshot tooling, and the depth of the skill and plugin ecosystem you can borrow from. Fifth, governance and churn: who controls the platform, what is open source, and how exposed you are if the vendor changes direction.
Notice what is not on the list: model quality benchmarks. Not because quality does not matter, but because frontier models leapfrog each other every few months, every platform can run a current frontier model, and the published coding benchmarks measure tasks that look nothing like translating a design system into components. Your own harness run through a real brief is a better benchmark than any leaderboard, and it costs an afternoon.
- Sandboxing and permission ergonomics: containment when things go wrong, and how often the tool interrupts you when they do not.
- Model access and context: which models, whether you can swap them, and how much design context fits in one session.
- Pricing shape: subscription usage windows, metered token credits, or bring-your-own-key — each fails differently under heavy design iteration.
- Design surfaces: image input, browser and screenshot loops, and the skill, plugin, and MCP ecosystem you can adopt instead of building.
- Governance and churn: open source versus proprietary, and the realistic cost to you if the platform pivots or retires.
Section 4
Claude Code, as verified June 2026
Claude Code is Anthropic's agent: a proprietary client in front of proprietary models, on a v2.1.x release line that ships updates several times a week as of June 1, 2026. It reads a CLAUDE.md hierarchy for project instructions, supports path-scoped rules, and its docs recommend importing AGENTS.md when a repository uses the open convention — which is exactly how this site's repository wires it. Skills live in .claude/skills/ with the standard SKILL.md format plus extras such as allowed-tools and forked execution context. MCP support is full and mature: project-level .mcp.json, remote servers, and plugin marketplaces.
Its operational character is the combination of a granular permission system — allow, deny, and ask rules plus modes like plan and accept-edits — with a native sandbox runtime for shell commands that isolates filesystem and network access. Anthropic claims large reductions in permission prompts from that sandbox; treat the specific percentage as a vendor figure. Subagents and agent teams are first-class, each with its own context window and tool permissions, which matters for design work when you want a critique agent that can read but not write.
For designers specifically, Claude Code currently has the deepest ecosystem to borrow from: image input by paste or file path, browser integration and the common Playwright and Chrome DevTools MCP servers for screenshot-verify loops, and the largest collection of public design-oriented skills and plugins, including Anthropic's own frontend and brand-guideline skills. The trade-offs are plan-and-window economics — heavy iteration days can hit the five-hour usage windows on Pro — and full dependence on a single vendor's models and roadmap. As verified June 2026, Pro is US$20 per month and Max tiers are US$100 to US$200, with optional pay-as-you-go overflow at API rates; re-check the plan pages before budgeting.
Section 5
Codex CLI, as verified June 2026
Codex CLI is OpenAI's terminal agent. The client itself is open source under Apache-2.0, with the latest stable release in the rust-v0.136 line as of June 1, 2026, while the models and the service behind it are proprietary. It reads AGENTS.md natively with a root-to-current-directory merge, takes settings from config.toml, discovers Agent Skills from .agents/skills/ at both repository and user level, and supports MCP servers declared as tables in the same config file. If you have read older comparisons claiming Codex lacks skills or has limited MCP support, discard them: both claims were true once and are now stale, which is a recurring theme in this category.
Codex's clearest differentiator is its sandbox model. It separates what the agent is physically allowed to touch — read-only, workspace-write, or full access, enforced at the operating-system level — from when it must stop and ask, which is a separate approval policy. That separation is the easiest of the four to explain to a security reviewer, and the easiest to reason about when you hand an agent a client codebase. Subagents are supported, and Codex Cloud runs tasks in isolated cloud sandboxes when you want parallel or background work off your machine.
For design work, Codex handles image input — screenshots and design specs attached in the composer — and added image generation and editing in 2026, with browser loops available through the same MCP servers everyone else uses. Its design-specific skill ecosystem is thinner than Claude Code's, but because it reads the shared .agents/skills/ location, community design skills written to the open standard work without modification. On cost: as verified June 2026, Codex usage is included with ChatGPT plans from Plus at US$20 per month upward, metered since April 2026 as token-based credits in rolling five-hour windows rather than per message, with API-key billing as the alternative. The shape matters: visual iteration burns tokens quickly, so the credit meter is something to watch during your evaluation week.
Section 6
OpenCode, as verified June 2026
OpenCode is the open-source option, MIT-licensed and maintained by Anomaly, with the repository now at anomalyco/opencode and a v1.15.x release line shipping several patches a week as of June 1, 2026. Its position is structural rather than feature-based: the agent is free, and you bring the models — your own API keys for any major provider, the curated OpenCode Zen models on pay-as-you-go credits, a flat Go plan at roughly US$10 per month for open-weight models, or whatever free promotional models are on the roster that month. That flexibility is the reason to pick it and the work you sign up for: model selection, cost tracking, and judging each provider's data handling become your job.
It is also the most accommodating platform for an existing harness. OpenCode reads AGENTS.md as its primary instruction file and falls back to CLAUDE.md if that is what a repository has; it discovers skills from .opencode/skills/, .claude/skills/, and .agents/skills/; and MCP servers and granular permissions live in opencode.json, including wildcard rules per tool and per subagent. There is no OS-level sandbox runtime equivalent to Claude Code's or Codex's — boundaries are policy-based, enforced by the permission system rather than the operating system — which is the main thing to weigh if you work on untrusted code or under strict compliance requirements.
For designers, OpenCode has no first-party design surface; vision support depends entirely on the model you attach, and the value comes from compatibility. A design harness built for Claude Code or Codex ports in without edits, the same MCP servers drive screenshot loops, and per-agent model overrides let you run cheap models for drafts and a frontier model for the final pass. As verified June 2026, the free promotional model roster changes without notice — never build a cost case on it.
Section 7
Gemini CLI to Antigravity: a deprecation in real time
Gemini CLI deserves its place in this comparison for two reasons: Google's models offer the largest context windows of the four ecosystems, which is genuinely useful when a whole design system needs to fit in one conversation, and its multimodal handling of screenshots, photos, and whiteboard captures is strong. It reads GEMINI.md context files, can be pointed at AGENTS.md through the context.fileName setting, supports Agent Skills and MCP, and was still shipping weekly releases at the end of May 2026.
It is also being retired for most readers of this article. Around Google I/O 2026, Google announced that Gemini CLI and the Gemini Code Assist IDE extensions stop serving requests for individual, AI Pro, and AI Ultra users on June 18, 2026; only Gemini Code Assist Standard and Enterprise organizations keep the existing CLI. The replacement is Antigravity CLI, a rewrite inside Google's Antigravity agent platform, which according to the transition announcement retains Agent Skills, hooks, subagents, and extensions as plugins. As of this article's verification date, Antigravity CLI's configuration details, sandbox model, and pricing enforcement had not been independently verified against its own documentation — treat any Antigravity specifics you read anywhere, including here, as provisional until you have checked the official docs yourself.
The transition is worth dwelling on, because it is the strongest real-world evidence for this article's thesis. Gemini CLI was open source, actively maintained, shipping weekly, and backed by one of the largest companies on earth — and none of that prevented a three-month-old setup guide from becoming migration homework. Release activity is not a proxy for longevity. If your design harness had been written into GEMINI.md files, custom Gemini extensions, and platform-specific settings, the migration would be real work. If it lived in AGENTS.md, .agents/skills/, and a documented MCP server list, the migration is an afternoon: install the successor, point it at the same files, rebuild the permission layer, re-run your evaluation brief. Deprecation risk is now a first-class selection criterion, and the mitigation is not picking the immortal vendor — there isn't one — it is keeping the harness portable.
Section 8
The comparison, in one matrix
The matrix below condenses the four platform profiles into the dimensions that matter for design work. Read it with the last-verified strip in mind: the rows about governance, file names, and standards change slowly; the rows about pricing, versions, and the Gemini transition are volatile and dated June 1, 2026. If you are reading this more than a quarter later, treat the volatile rows as a list of things to re-check rather than facts to rely on.
Two readings of the matrix are worth calling out. Read horizontally, the instruction-file, skills, and MCP rows are nearly identical — that is the convergence on open standards, and it is why the harness ports. Read vertically, each column has a distinct operational personality: Claude Code pairs the deepest design ecosystem with subscription-window economics; Codex CLI pairs the clearest sandbox story with token-credit metering inside ChatGPT plans; OpenCode trades vendor polish for freedom and self-responsibility; and the Gemini column is, for individuals, a migration in progress.
Operational differences between the four platforms, weighted for design-to-code work. Every cell last verified 2026-06-01; pricing, versions, and the Gemini-to-Antigravity transition are volatile.
Section 9
A decision flow you can defend
Turning the criteria into a sequence keeps the decision honest, because it forces the boring questions to come before the interesting ones. The flow below is ordered by how much each question constrains the answer, not by how much fun it is to debate.
Start with money you already spend: most design teams sit inside organizations that already pay for Claude or ChatGPT, and the agent included in that plan is the cheapest one to evaluate seriously. Then ask about safety rails, because they are non-negotiable when they apply at all — client repositories, regulated environments, shared machines. Then model flexibility: if you need to swap providers, run open-weight models, or control exactly which model sees client work, OpenCode is the platform built around that need. Then context size, for the minority of teams whose design system genuinely cannot be chunked — and check the current numbers on official model pages rather than trusting any article, this one included. Last, churn tolerance: how much would it cost you, in real hours, if your platform announced its retirement next quarter? The Gemini CLI transition is what that question looks like when it stops being hypothetical.
The flow ends where the article began: whatever the answers point to, keep the harness in the open formats so the decision stays reversible. A platform choice you can revisit in an afternoon is a low-stakes choice, and low-stakes choices do not deserve months of deliberation.
Five questions in order of how much they constrain the answer, each pointing at a recommendation, with the portability reminder as the footer.
Section 10
Case study: the same brief and harness on Claude Code and OpenCode
To test how much the platform actually changes the output, we ran one design brief through two platforms with an identical harness. The brief: a marketing hero plus a three-item feature grid for a school landing section, built against this site's DESIGN.md tokens, responsive at 360, 768, and 1280 pixels, semantic tokens only, and consistent with the written direction — editorial, bookish, no gradient decoration. The harness: the same AGENTS.md project rules, the same DESIGN.md, and the same prototype skill, with Claude Code reading the shared rules through its CLAUDE.md import and OpenCode reading the same files directly. Honesty notes first: the Claude Code leg is traced from a real session in this repository's harness. The OpenCode leg is an honestly reproduced trace — the harness portability was verified by inspection (OpenCode's documented discovery order picks up this repository's AGENTS.md, CLAUDE.md fallback, and existing skill folders without any edits), but the generation run itself was reconstructed from documentation and from equivalent runs rather than executed in this environment, so it gets no quality score and its cost figures are estimates. Codex CLI and Gemini/Antigravity were not run at all and are described in this article from documentation only.
The Claude Code run took three iterations over roughly forty minutes. The first pass produced a structurally correct section that reused the site's heading and badge components and stayed inside the token system — the harness doing its job — but invented a two-tone split background that the design system does not use, and set the feature grid at an even three columns on tablet where the content wanted two. The second pass fixed both against the written direction. The third pass was triggered by the review gates, not by taste: the design audit passed, but the accessibility check flagged a heading-level skip introduced when the hero title was demoted to make room for an eyebrow label. Total usage stayed comfortably inside a Pro plan's five-hour window; the session consumed a noticeable but not alarming share of it, and the same work metered through the API would have cost a few dollars at June 2026 rates.
The reproduced OpenCode leg is most useful for what it confirms and what it changes. Confirmed: the harness loads without modification — same instruction file, same skill, same MCP-driven screenshot loop — which is the portability claim this article needed to prove, and the part that was genuinely verified. Changed: the operational texture. Model choice becomes your decision, and with it the cost profile; running a frontier model through your own API key for a session of this size lands in the low single-digit dollars at June 2026 list prices, while a cheaper model cuts the cost and, in our experience with equivalent briefs, raises the iteration count. Permissions had to be re-declared in opencode.json because nothing carries over from Claude Code's settings, and there is no OS sandbox underneath the policy rules. The honest conclusion from the pair: the harness determined what good looked like and most of how fast both runs got there; the platform determined how the work was contained, metered, and paid for. That is the right division of labor, and it is the reason the harness deserves more of your investment than the platform decision does.
Brief: hero + feature grid against DESIGN.md tokens, responsive at 360/768/1280, semantic tokens only — identical for both legs
Claude Code (executed): 3 iterations, ~40 minutes; misses were an invented split background, a tablet grid call, and a heading-level skip caught by the accessibility gate
OpenCode (reproduced): harness loads unchanged — AGENTS.md, skills, MCP servers verified by inspection; generation trace reconstructed, not executed; no quality score assigned
Cost shape: Claude Code drew on plan usage windows; OpenCode costs depend on the model behind your own key — low single-digit dollars per session at June 2026 list prices, labeled as an estimate
Rebuilt per platform: permission rules and sandbox posture; nothing in the design harness needed editing
What each leg of the case study established. The Claude Code leg was executed in this repository; the OpenCode leg is a reproduced trace with its portability claims verified by file inspection and its costs labeled as estimates.
Section 11
The same harness, four config surfaces
Here is what the portability looks like as files. The block below shows one repository carrying a single source of design truth and the thin per-platform adapters around it. The shared layer is three things: AGENTS.md for project rules, a skills folder in the open location, and DESIGN.md plus tokens for the design system. Everything else is an adapter measured in lines, not days.
Two practical notes. First, keep one instruction file as the source and make every other file a pointer or import — duplicated instructions drift apart within weeks, and an agent reading the stale copy will follow it confidently. Second, check the discovery rules of the platform you are adopting rather than assuming: as verified June 2026, Codex CLI and OpenCode read .agents/skills/ directly, OpenCode also reads .claude/skills/, and Claude Code wants its own folder, which a symlink or a small sync script handles. Gemini CLI accepts AGENTS.md through a setting; what Antigravity CLI accepts should be confirmed against its own documentation once you are on it.
my-product/
├── AGENTS.md # the source of truth: stack, design rules pointer, commands
├── DESIGN.md # design system: tokens, components, anti-patterns
├── tokens/ # DTCG-format token JSON
├── .agents/
│ └── skills/
│ └── design-prototype/
│ └── SKILL.md # read natively by Codex CLI and OpenCode
│
│ # --- per-platform adapters ---
├── CLAUDE.md # Claude Code: one line — "@AGENTS.md"
├── .claude/
│ ├── skills/ # symlink or copy of .agents/skills/
│ └── settings.json # permissions: NOT portable, rebuild here
├── .codex/
│ └── config.toml # Codex: MCP servers + sandbox/approval settings
├── opencode.json # OpenCode: MCP servers, permissions, model choice
└── GEMINI.md # Gemini CLI: pointer to AGENTS.md
# (or set context.fileName = "AGENTS.md" in settings)
# Antigravity CLI: verify equivalents on its own docsSection 12
One MCP server, four config dialects
MCP is the layer that makes design tooling portable: the same Playwright server that drives a screenshot-verify loop, or the same Figma server that reads design context, works on every platform. What differs is only where and how you declare it. The snippet below registers the same Playwright MCP server in each platform's config format, as documented in June 2026.
The server is identical in all four; only the syntax changes. This is worth showing precisely because it is boring — translating these declarations is the entire MCP migration cost when you switch platforms. Keep a plain-text list of the servers your design workflow depends on, with the command and any environment variables, and you can regenerate any of these blocks in minutes.
// Claude Code — .mcp.json (project scope)
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
# Codex CLI — ~/.codex/config.toml
[mcp_servers.playwright]
command = "npx"
args = ["@playwright/mcp@latest"]
// OpenCode — opencode.json
{
"mcp": {
"playwright": {
"type": "local",
"command": ["npx", "@playwright/mcp@latest"],
"enabled": true
}
}
}
// Gemini CLI — .gemini/settings.json
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}Section 13
What it costs, honestly and with dates
Pricing is the most volatile fact in this article and the one most likely to be quoted without its date, so here it is with the date attached: every figure below is a US-dollar list price as verified on June 1, 2026, every platform changed its pricing or packaging in the eight weeks before that date, and you should read the linked pricing pages before any budgeting conversation.
Claude Code is included with Claude plans: Pro at US$20 per month and Max tiers at US$100 and US$200, with usage shared with the Claude apps in five-hour rolling windows and optional pay-as-you-go overflow at API rates; pure API metering is the alternative. Codex is included with ChatGPT plans from Plus at US$20 per month, Business at US$30 per user, and Pro at US$200, metered since April 2026 as token-based credits in five-hour windows, with API billing as the alternative. OpenCode's agent is free; you pay for models — your own provider keys at those providers' rates, OpenCode Zen pay-as-you-go credits, a Go plan at roughly US$10 per month for open-weight models, or promotional free models that change without notice. Google reset its consumer AI plans around mid-May 2026 to a range running from roughly US$8 to US$200 per month, with a free tier with usage limits continuing for Antigravity; those figures came from launch coverage rather than a direct read of Google's pricing pages and should be treated with corresponding caution.
More useful than the numbers is the shape of each model under design workloads. Visual iteration is token-heavy: screenshots in, screenshots out, long design-system context, repeated regeneration. Subscription windows fail by making you wait — you hit the window mid-iteration and the loop stalls. Token credits and API metering fail by surprising you — the loop never stalls but the meter runs, and a heavy review day costs multiples of a normal one. Bring-your-own-key fails by becoming nobody's job — costs spread across personal keys until finance asks a question no one can answer. None of these is wrong; pick the failure mode your team can manage, and measure your own usage during the evaluation week rather than extrapolating from anyone else's.
Section 14
Anti-patterns, limits, and what this comparison cannot tell you
The first anti-pattern is choosing on benchmarks. Published coding leaderboards measure algorithmic problem-solving, not whether an agent respects a spacing scale or notices that a card pattern already exists. The second is trusting feature checklists, including the matrix in this article: a checkmark says a capability exists, not that it is pleasant to operate, and the differences that cost you an afternoon live in the operating, not the existing. The third is hard-coding your design knowledge into platform-specific config — GEMINI.md files no other tool reads, permission rules treated as documentation, skills in a proprietary location — which converts a reversible choice into a hostage situation. The fourth is quoting prices, versions, or context windows without dates. Every stale comparison on the internet was accurate the day it was written.
There are also limits to what this article can claim. Only Claude Code was executed against the case-study brief in this repository; the OpenCode leg was reproduced and clearly labeled, and Codex CLI and Gemini/Antigravity are described from their documentation. That means this article can compare operational models, file formats, and pricing shapes with confidence, but it cannot tell you which platform produces the best design output for your stack — and you should be skeptical of anyone who claims to know that across four platforms, two of which they have not run against your harness. Antigravity CLI specifics are provisional until its documentation has been read directly. And the fairness caveat applies in both directions: naming Gemini CLI's retirement is not a judgment of Google's models or of Antigravity, which may turn out to be the strongest design platform of the lot; it is a fact about churn that any of the four vendors could generate next.
Finally, the portability argument has its own limit. A harness ports; a team's muscle memory does not. Keyboard habits, permission reflexes, the shape of a good prompt for a particular model — those take a week or two to rebuild after a switch, and during that week output quality dips. Portability makes switching cheap, not free. The point is to keep the cost low enough that you can act on new information, not to pretend the cost is zero.
- Do not choose on coding benchmarks; run your own harness through a real brief instead.
- Do not treat feature checklists as comparisons; the differences live in ergonomics, pricing shape, and governance.
- Do not let design knowledge live only in platform-specific files; keep the source of truth in the open formats.
- Do not quote prices, versions, or context windows without a verification date — and do not trust anyone else's that lack one.
- Do not read platform descriptions written from documentation as if they were tested results; this article labels which is which.
Section 15
A 30-minute evaluation you can run with your own harness
The practical close is not a recommendation — it is a procedure, because the honest answer to which platform changes with your constraints and with the calendar. If you have a harness, you can evaluate any platform against your own work in about thirty minutes, and the same checklist doubles as a migration checklist when a platform forces your hand the way Gemini CLI just did.
Run it on the platform your existing subscription already includes, then on one alternative — the comparison teaches you more than either run alone. Keep the notes. The next time pricing changes, a feature you rely on moves, or a deprecation notice lands, you re-run the same thirty minutes instead of re-litigating the whole decision.
The last item on the checklist is the one that keeps this article useful after its facts age: put the platforms you depend on into a re-verification loop. This site does that through its tool-watch process — a recurring research pass over each platform's changelog, pricing page, and announcement feed that flags changes for review rather than letting articles silently rot. You do not need anything that formal: a calendar reminder to re-read four pricing pages and four changelogs once a quarter, and a habit of writing last verified next to every volatile fact in your own internal docs, captures most of the value.
# Platform evaluation — ~30 minutes with an existing harness
## Setup (10 min)
- [ ] Install the CLI; authenticate with the plan or API key you would actually use
- [ ] Confirm it reads your AGENTS.md (or add the one-line adapter: CLAUDE.md import,
GEMINI.md pointer, or context.fileName setting)
- [ ] Confirm it discovers your skills folder; map or symlink if needed
- [ ] Declare your MCP servers in the platform's config format
- [ ] Set permissions/sandbox to the strictest mode you can work in
## Run (15 min)
- [ ] Run your standard evaluation brief (the same one every time)
- [ ] Note: iterations to acceptance, wall-clock time, interruptions for approvals
- [ ] Run your review gates (build, design audit, accessibility check)
- [ ] Check usage/cost: window consumed, credits burned, or tokens billed
## Decide (5 min)
- [ ] Record results next to previous platforms' results, with today's date
- [ ] Note anything that required editing the harness itself (it should be nothing)
- [ ] List what would NOT port if you adopted this platform and later left it
- [ ] Add the platform's changelog + pricing page to your re-verification loop
- [ ] Date-stamp every volatile fact you wrote down: "last verified <date>"Sources
Sources & further reading
- Claude Code changelog
Official release history for Claude Code; the source for version and feature-cadence claims.
- Using Claude Code with your Pro or Max plan
Anthropic's plan and usage-window article; the primary source for Claude Code pricing as of June 2026.
- Codex pricing and credits
OpenAI's official page for Codex inclusion in ChatGPT plans and the token-credit metering introduced in April 2026.
- Codex sandboxing and approvals
Documentation of Codex CLI's read-only, workspace-write, and full-access sandbox modes and the separate approval policy.
- openai/codex
The open-source Codex CLI repository: releases, AGENTS.md handling, and skills documentation.
- OpenCode documentation
OpenCode's docs hub covering rules, agents, permissions, skills, MCP, and the Zen and Go pricing options.
- anomalyco/opencode
OpenCode source repository and release history.
- Transitioning Gemini CLI to Antigravity CLI
Google's transition announcement, including the June 18, 2026 cutoff for individual, AI Pro, and AI Ultra users.
- google-gemini/gemini-cli
Gemini CLI repository: releases, GEMINI.md context files, skills, and MCP configuration.
- Google Antigravity documentation
Documentation for the Antigravity platform that succeeds Gemini CLI for individual users.
- agents.md
The open AGENTS.md convention that makes project instructions portable across agents.
- Agent Skills
The open SKILL.md standard adopted by all four platforms covered in this article.
- Model Context Protocol
The MCP specification — the shared integration layer for design tooling such as Playwright, Figma, and canvas servers.

