AAgentic Design School
Module 2 of 5
45–55 minutes

Motion and Storytelling with Agents

Remotion and Hyperframes

The two working approaches in practice: Remotion compositions as React components the agent can write and revise, and hyperframe-style keyframe sequences for lighter-weight motion — with the project structure that keeps either maintainable.

Duration45–55 minutes

Slides14 slides with notes and narration

Learning objectives

  • Set up a Remotion project an agent can work in: compositions, props, and assets.
  • Use hyperframe-style sequences for lighter motion without a full video pipeline.
  • Brief motion work in terms of timing, hierarchy, and emphasis rather than tool operations.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1416:9

Two tools, one principle — motion as code

Motion and Storytelling with Agents · Module 2 of 5

  • Remotion: video as React components an agent can write and revise
  • Hyperframes: video as plain HTML with timing attributes — no build step
  • The project structure, tokens, and briefs that keep either maintainable
  • Reviewing and rendering motion as part of the loop you already run

Module 1 made the case for code-defined video. This module is about actually building it — with two stacks that get you to an MP4 from a brief.

Slide notes

Module 1 argued that video becomes a different kind of artifact when it is defined in code: diffable, regenerable, fed by the same tokens as the product. This module is the practical half of that argument — the two stacks that currently have a credible agent story, and the working habits that make either of them maintainable past the first clip.

Name the two upfront so nobody spends the module waiting for a verdict. Remotion is the established React-based framework: compositions are components, frames are computed from a frame number, and rendering scales from a laptop to distributed cloud functions. Hyperframes is the newer HTML-native option built explicitly for agents: a composition is a plain HTML file with timing attributes, animated with the web's existing libraries, rendered with headless Chrome. Both are real, both ship agent skill packs, and the choice between them is structural rather than qualitative.

Also set the level. This is not a programming module. The agent writes the composition code; the designer's job is the brief, the project structure decisions, and the review. Where code appears on slides it is there to be read, not memorised — the point is knowing what a healthy composition looks like when you review one.

Narration for this slide

Welcome back. Module 1 made the argument that video defined in code is a different kind of artifact — one an agent can write, you can review, and your team can re-render when the product changes. This module is where we build it. We will look at the two stacks that currently have a credible agent story: Remotion, where a video is a set of React components, and Hyperframes, where a video is a plain HTML file with timing attributes. We will cover the project structure that keeps either maintainable, how to brief a scene in design language rather than tool operations, and how to review and render the result. Two tools, one principle: motion as code.

Slide 2 of 1416:9

Remotion in one slide

A Remotion video is a React component evaluated once per frame. Everything else follows from that.

  • A Composition registers the video: id, width, height, fps, durationInFrames
  • useCurrentFrame() gives the component the frame number; motion is computed from it
  • Sequence time-shifts scenes: from frame 0, from frame 180, and so on
  • interpolate() and spring() turn frame numbers into positions, opacities, scales
  • Render with npx remotion render — locally, on a server, or distributed on Lambda

Every frame is a pure function of the frame number. That is what makes rendering deterministic — and what makes the agent's job a normal coding job.

Slide notes

Keep the model small: a Remotion project is a React app whose components receive a frame number instead of user events. The Composition declares the video's metadata — dimensions, frame rate, duration in frames — and points at a component. Inside that component, useCurrentFrame() returns which frame is being rendered, and helpers like interpolate() and spring() turn that number into the visual state for that instant. A 30-second clip at 30 fps is 900 evaluations of the same component.

Sequence is the structural primitive worth calling out for designers: it shifts a child component in time, so a video becomes a stack of scenes with explicit start frames and durations — readable structure, not a hidden timeline. That is what the agent edits when you ask for the second scene to start a second later.

Two facts to date-stamp. As of June 2026 Remotion's stable line is 4.x with a large ecosystem — studio preview, an embeddable player, templates, and official agent skills. And the frame-pure model has a cost: animation libraries that run on wall-clock time, GSAP being the common one, do not fit naturally; you describe motion in Remotion's own vocabulary. That trade-off is the seam the comparison later in the module runs along.

Narration for this slide

Here is Remotion in one idea: a video is a React component that gets evaluated once per frame. A Composition registers the video — its size, frame rate, and duration. Inside, the component asks for the current frame number and computes what the screen should look like at that instant, using helpers like interpolate and spring. Scenes are stacked with Sequence, each starting at an explicit frame. Because every frame is a pure function of that number, rendering is deterministic — and writing the video is a normal coding job, which is exactly why an agent can do it. The cost is that motion lives in Remotion's own vocabulary, not in libraries like GSAP. Hold that thought.

Slide 3 of 1416:9

Project structure an agent can navigate

The agent's output quality tracks what it can find. Structure the project so scenes, data, and brand assets are obvious.

  • One file per scene, named for what it shows — not Comp1, Comp2
  • A root file that registers compositions and sequences scenes in order
  • Content as data: titles, claims, and figures in props or a JSON file, not hard-coded in markup
  • Tokens and brand assets in one place the compositions import from
  • An instruction file (CLAUDE.md or equivalent) stating durations, easing defaults, and what not to invent

A motion project is a small design system: scenes are components, content is data, and the brand lives in tokens. Treat it that way from the first clip.

Slide notes

This slide is the maintainability argument. The first clip an agent produces will work almost regardless of structure. The fifth clip — the one where you change a product name across three videos, or re-skin everything after a brand refresh — only stays cheap if the project was structured for it. The structure is the same one designers already know from component libraries: one scene per file, named for its content; a root file that registers and sequences them; and content pulled out into props or a data file so a copy change is a data change, not an edit inside markup.

The tokens point matters most. If the compositions import colours, type, and spacing from the same token source as the product — or at minimum from one shared stylesheet — then a brand change propagates by re-rendering. If colours are hard-coded per scene, every video is a snapshot again, which is the exact failure code-defined video was supposed to remove.

The instruction-file point connects back to the harness material from earlier courses and Module 1: scene duration defaults, easing personality, reduced-motion behaviour, and the rule against inventing copy belong in the project's standing instructions, not in every brief. Remotion ships CLAUDE.md and AGENTS.md guidance in its own repository, and Hyperframes installs its skills into the project — both are signals that the frameworks expect this file to exist.

Narration for this slide

Whichever stack you pick, the structure decides whether the second and fifth videos stay cheap. Treat the motion project like a small design system. One scene per file, named for what it shows. A root file that registers the compositions and puts the scenes in order. Content — the headlines, the claims, the figures — held as data, so a copy change never means digging through markup. Brand colours and type imported from tokens, so a rebrand is a re-render, not a re-edit. And an instruction file that records your duration defaults, your easing personality, and the rule that the agent never invents copy. The agent can only reuse what it can find.

Slide 4 of 1416:9

Hyperframe-style sequences: when a full video project is too much

Sometimes the need is a ten-second loop on a landing page, not a pipeline. Hyperframes treats video as an HTML document.

  • A composition is a plain HTML file — no bundler, no scaffold, openable in a browser
  • Timing lives in markup: data-start, data-duration, data-track-index on each element
  • Motion is a normal GSAP timeline (or Lottie, CSS, Three.js) the renderer pauses and seeks per frame
  • CLI loop: init, preview with hot reload, lint, inspect, render to MP4
  • Built for agents: non-interactive CLI plus an installable skill pack

If your team's motion language is already HTML, CSS, and GSAP, hyperframe-style sequences remove more friction than they add — especially for short, recurring clips.

Slide notes

Hyperframes, published by HeyGen, is the lighter-weight answer for teams whose motion needs are real but small: short product clips, animated cards, looping walkthroughs. The composition is an ordinary HTML file. Elements carry data attributes that say when they appear and for how long; appearance is plain CSS; movement is a standard GSAP timeline created paused. The renderer's trick is the seek: before capturing each frame with headless Chrome it pauses the animation runtime and seeks it to that frame's exact time, which is what makes wall-clock libraries render frame-accurately — the thing Remotion's frame-pure model deliberately avoids needing.

The agent story is the strongest part of the pitch: the CLI is non-interactive by default, there is an installable skill pack that teaches the agent the framework's conventions, and lint and inspect commands give the agent checks it can run on its own output. As of June 2026 the project is pre-1.0 (0.6.x), Apache-2.0 licensed in the repository, requires Node 22 or later plus FFmpeg, and renders on a single machine — fine for clips, the wrong tool for thousands of personalised renders.

The phrase hyperframe-style in the module title is deliberate: even if the specific tool changes, the pattern — HTML as the composition format, timing in attributes, an existing animation runtime seeked per frame — is the durable idea, and it is the lowest floor for a designer who wants to read or hand-tweak a composition without learning React.

Narration for this slide

Not every motion need justifies a full video project. Sometimes it is a ten-second loop or a short product clip, and for that, the hyperframe-style approach is the lighter answer. A composition is just an HTML file. Timing lives in data attributes on the elements — when each thing appears, for how long, on which track. The motion is a normal GSAP timeline, the same one a front-end developer would write for a landing page, and the renderer pauses it and seeks to each frame before capturing. No build step, a command-line loop built for agents, and a skill pack that teaches the agent the conventions. As of mid 2026 it is pre-1.0 and renders on one machine — but for short, recurring clips, it is the lowest-friction path from brief to MP4.

Slide 5 of 1416:9

The two stacks, side by side

Same principle, different materials. The structural differences — not the feature lists — are what should drive the choice.

Side-by-side diagram of the Remotion and Hyperframes stacks. Remotion: authored as React components where every frame is a function of the frame number, animated with interpolate and spring, rendered via local CLI, server-side API, or distributed Lambda, fitting React teams and rendering at volume. Hyperframes: authored as plain HTML with data-start, data-duration, and data-track-index attributes and no build step, animated with GSAP, Lottie, Three.js, or CSS seeked to each frame's time, rendered with headless Chrome and FFmpeg on a single machine, fitting agent-first authoring and GSAP-heavy motion. A shared band underneath lists what both have in common: diffable compositions in the repository styled from design tokens, agent skill packs, one-line render commands, the MP4 treated as a build output, and review through the same gates as any other agent artifact.
Remotion authors in React with frame-pure animation and distributed rendering; Hyperframes authors in HTML with seek-based animation runtimes and single-machine rendering. The shared band is the part that matters most: both produce diffable, token-driven compositions an agent can write and your existing review gates can hold.

The shared band is the real story: both stacks make video a diffable, token-driven artifact. The columns just decide which vocabulary your team writes it in.

Slide notes

Walk the diagram column by column, then land on the shared band. Authoring: Remotion is React components where every frame is computed from the frame number; Hyperframes is plain HTML with timing in data attributes and no build step. Animation runtime: Remotion wants motion described in its own frame-pure helpers, interpolate and spring; Hyperframes pauses and seeks existing runtimes — GSAP, Lottie, Three.js, CSS — so the web's animation vocabulary carries over directly. Rendering: Remotion runs locally, server-side, or distributed across Lambda, which is a structural advantage at volume; Hyperframes drives headless Chrome and FFmpeg on one machine, which is fine for clips and not for personalisation at scale.

Then the fit lines: Remotion suits React teams, embedded players, and render volume; Hyperframes suits agent-first authoring, GSAP-heavy motion language, and lighter recurring clips. Neither line is a quality judgment.

Spend the most time on the shared band, because it is the reason this course exists: in both stacks the composition is diffable text in the repository, styled from design tokens, written by an agent that has loaded a skill pack, rendered with one command, and reviewed through the same gates as any other artifact. The MP4 is a build output. Choosing between the columns is a real decision; missing the shared band is missing the point.

Narration for this slide

Here are the two stacks side by side. On the left, Remotion: React components, frames computed from a frame number, motion in interpolate and spring, rendering that scales from your laptop to distributed cloud functions. On the right, Hyperframes: plain HTML with timing in data attributes, motion in GSAP or Lottie or CSS seeked frame by frame, rendered with headless Chrome on one machine. The fit lines are honest — React teams and render volume on the left, agent-first authoring and GSAP-heavy motion on the right. But look at the band underneath, because that is the actual story. Both give you a diffable composition in your repository, styled from your tokens, written by an agent, and reviewed through the gates you already have. The columns decide the vocabulary. The band is the principle.

Slide 6 of 1416:9

Choosing between them: the facts that decide it

Feature lists age fast. These are the structural and licensing facts, date-stamped, that actually decide the choice for most teams.

RemotionHyperframes
AuthoringReact components (TSX), build step requiredPlain HTML + CSS, no build step
AnimationFrame-pure interpolate() and spring()GSAP, Lottie, Three.js, CSS — seeked per frame
RenderingCLI, server-side, distributed LambdaHeadless Chrome + FFmpeg, single machine
Licence (June 2026)Source-available; free up to 3 employees, paid Company License aboveApache 2.0 in the repository; no company-size thresholds
Maturity (June 2026)4.x, large ecosystem, ~47k stars0.6.x, pre-1.0, fast-moving, ~19k stars
Agent integrationOfficial AI docs and skill packSkill pack, plugins, non-interactive CLI

If your product is React and you will render at volume, Remotion. If your motion language is HTML and GSAP and the prompt-writers are designers, Hyperframes. Re-verify the licence terms on the day you decide.

Slide notes

The licensing row is the one most teams discover late, so spell it out. Remotion is not open source: it is source-available under its own licence, free for individuals, non-profits, and companies with up to three employees, and above that threshold a paid Company License applies — per-seat tiers for people working in the Studio and per-render pricing with a monthly minimum for automated rendering, as published on the official licence page as of June 2026. None of this is hidden, but the fact that npm install works leads teams to assume MIT-style terms, and that assumption fails exactly when adoption succeeds. Hyperframes' repository describes Apache 2.0 with no thresholds; it is owned by a commercial company and pre-1.0, so read the LICENSE file directly and re-check at upgrade time.

The maturity row cuts the other way. Remotion's 4.x line is stable, with hundreds of releases, a large ecosystem, and production users; Hyperframes is at 0.6.x and shipping rapidly, which means commands and behaviour can drift between versions — pin the version you build on. Both projects ship first-party agent support: Remotion documents agent prompting and publishes a skill pack, Hyperframes installs a set of skills and plugins built for non-interactive use.

Give the honest tie-breakers from the related article: React product, embedded previews, or render volume points to Remotion; HTML-and-GSAP motion language, designer prompt-writers, and cost sensitivity points to Hyperframes. Teams that cannot say which sentence describes them should build the same short clip in both — it takes an afternoon and settles it with evidence.

Narration for this slide

When you have to choose, these are the rows that decide it. Authoring and animation are about vocabulary: React and frame-pure helpers on one side, HTML and the web's existing animation libraries on the other. Rendering is about scale: Remotion can distribute a render across cloud functions, Hyperframes runs on one machine. And then licensing — the row teams discover too late. As of June 2026, Remotion is source-available: free up to three employees, a paid company licence above that. Hyperframes describes itself as Apache 2.0 with no thresholds, but it is pre-1.0 and moving fast, so pin your version and read the licence file. If you genuinely cannot decide, build the same thirty-second clip in both. It takes an afternoon.

Slide 7 of 1416:9

Briefing a scene: timing, hierarchy, emphasis

The brief is design language, not tool operations. The agent translates it into frames or data attributes — that part is its job.

  • Shots with explicit durations that sum to the clip's total length
  • On-screen text written out word for word — never "something punchy here"
  • Hierarchy per shot: what the viewer must read first, second, and not at all
  • Emphasis and pacing: what holds, what moves, how long a headline stays readable
  • References and constraints: easing personality, reduced motion, no invented copy

The single biggest quality lever is whether the brief contains the actual words that will appear on screen. Mood adjectives produce placeholder enthusiasm.

Slide notes

This is the module's centre of gravity for designers, because it is the part that does not transfer from the tools. A motion brief is a shot list with intent, not a prompt asking for something premium and dynamic. Each shot gets a start and end time, and the durations must sum to the clip's total — that arithmetic constraint alone removes a whole class of pacing drift. Each shot gets its on-screen text written out in full; the traced runs behind this course show that an agent asked to make it feel premium fills the screen with confident placeholder copy, and every word of it has to be corrected later anyway.

Hierarchy and emphasis are the parts a motion specialist would normally carry in their head, so write them down: what the viewer reads first, what supports it, what is allowed to move and what must hold still long enough to be read. Pacing rules can be crude and still work — a headline needs roughly a second per six words on screen before anything else demands attention.

Constraints close the brief: the easing personality from your motion system, reduced-motion behaviour, the rule that the agent never invents copy or statistics, and the instruction to run the project's checks before declaring the scene ready. Notice that nothing in this brief mentions Remotion or Hyperframes. That is deliberate — the same brief drives either stack, which is what makes the skill durable when the tools change.

Narration for this slide

Here is the skill that stays yours: briefing a scene. A good motion brief is a shot list with intent. Every shot gets a duration, and the durations add up to the length of the clip. Every piece of on-screen text is written out, word for word — the moment you write something punchy here, the agent invents copy you will spend the rest of the session removing. Then hierarchy: what the viewer reads first, what supports it, what holds still. Then emphasis and pacing — how long a headline needs to stay readable. And constraints: your easing, reduced motion, no invented claims. Notice the brief never mentions a tool. Timing, hierarchy, emphasis. The agent does the translation into frames.

Slide 8 of 1416:9

What the agent writes from that brief

The same six-second title card in each stack. Read the structure, not the syntax: timing, tokens, and motion are all visible in the text.

Hyperframes (HTML excerpt) and Remotion (TSX excerpt)
<!-- Hyperframes: timing in markup, motion in a paused GSAP timeline -->
<div id="root" data-composition-id="shot-01-title"
     data-start="0" data-duration="6" data-width="1920" data-height="1080">
  <div class="kicker" data-start="0.2" data-duration="5.8">New in Studio</div>
  <h1 class="headline">Timeline editing<br/>without the timeline</h1>
</div>

// Remotion: timing from the frame number, motion from spring()
const frame = useCurrentFrame();
const lineY = spring({ frame: frame - 9, fps, config: { damping: 200 } });
<Sequence from={0} durationInFrames={6 * 30}>
  <TitleCard kicker="New in Studio" line1="Timeline editing" line2="without the timeline" />
</Sequence>

In both versions the timing structure, the copy, and the token usage are readable in the diff — which is exactly what you review.

Slide notes

The point of putting code on a slide in a design course is narrow: a designer reviewing motion work needs to be able to read the structure, not write it. Walk what is visible in each excerpt. In the Hyperframes version, the shot's duration and each element's entrance time are data attributes in the markup — the entire timing structure is readable without running anything — and the headline copy is right there to check against the brief. The motion itself lives in a GSAP timeline created paused, which is omitted here for space but is the same timeline a front-end developer would write for a landing page.

In the Remotion version, the same six seconds are expressed as frame arithmetic: the Sequence starts at frame zero and lasts six seconds' worth of frames, and the staggered reveal is two spring-driven transforms offset by a couple of frames. Same intent, different vocabulary. Neither is harder in any absolute sense; the question from the comparison slide — which vocabulary does your team already speak — is the one that matters.

What to take into your review habit: check the copy against the brief word for word, check that durations match the shot list, check that colours and type come from tokens or the shared stylesheet rather than hard-coded values, and be suspicious of any animation code that is registered but never advanced — that defect is covered properly two slides from now.

Narration for this slide

Here is what the agent actually writes from that brief — the same six-second title card in both stacks. In the Hyperframes version, look at the markup: the shot lasts six seconds, the kicker appears at zero point two, and the headline copy is sitting right there for you to check against the brief. In the Remotion version, the same scene is frame arithmetic: a Sequence that starts at frame zero, springs that lift the headline lines a few frames apart. You are not expected to write either of these. You are expected to read them — to check the words, the durations, and that the colours come from tokens. That is what reviewing motion as code looks like.

Slide 9 of 1416:9

Design tokens and brand assets inside motion projects

The brand reaches the video the same way it reaches the product: through tokens, shared styles, and named assets — not per-clip decisions.

  • Colours, type, and spacing imported from the token source or one shared stylesheet
  • Logos, product imagery, and music in a named assets folder the agent is told about
  • Motion tokens too: standard durations, easing curves, stagger values
  • Per-clip content injected as props or variables, so templates re-skin without edits
  • Harness rule: nothing hard-coded that the design system already defines

When the brand changes, a token-driven motion project re-renders. A hard-coded one becomes a folder of stale MP4s with your old logo in them.

Slide notes

This slide extends the design-system discipline into the motion project, and the argument is economic. The whole case for code-defined video is that it stays current with the product. That only holds if the compositions consume the brand the same way the product does: colours and type from tokens or a shared theme stylesheet, logos and imagery from a named assets directory, music and voice files treated as versioned assets rather than email attachments. In the Hyperframes case the compositions can link the same CSS custom properties the marketing site uses; in Remotion the tokens arrive as imported constants or a theme object passed through props.

Motion tokens deserve their own line because they are the part teams forget: a small set of standard durations, easing curves, and stagger values, written into the project's instruction file, is what makes ten clips produced over six months feel like one brand rather than ten experiments. Module 4 builds the full motion system; here the point is just that the agent should be applying values, not inventing them per clip.

The templating point closes the loop: keep per-clip content — headlines, figures, product names — as props or composition variables so the same scene template serves many clips. The traced project behind the related article runs this way: the chapter videos and the short feature clips share templates and differ only in the variables fed in. That is the structure that makes the third video nearly free.

Narration for this slide

The brand should reach the video exactly the way it reaches the product: through tokens. Colours, type, and spacing come from the token source or one shared stylesheet. Logos, product shots, and music live in a named assets folder the agent knows about. And add motion tokens — standard durations, easing curves, stagger values — so ten clips made across six months still feel like one brand. Keep the per-clip content as variables, so the same scene template serves many videos. The payoff is simple: when the brand changes, a token-driven project re-renders. A hard-coded one becomes a folder of stale MP4s with your old logo in them.

Slide 10 of 1416:9

Reviewing motion: beyond "does it play"

A clip that plays smoothly can still be wrong. Review the composition and the frames, not just the preview.

  • Copy check: every on-screen word matches the brief — no invented claims or figures
  • Timing check: shot durations match the shot list; nothing overlaps on the same track
  • Dead-animation check: compare early and late frames — a registered timeline that never advances renders a still image
  • Token check: no hard-coded colours or fonts outside the design system
  • Pacing and legibility: headlines stay on screen long enough to read — a human call, not a lint rule
  • Accessibility: reduced-motion behaviour, caption space, contrast at the rendered size

Lint and frame inspection catch the mechanical failures. Pacing is the failure no automated gate catches — budget a human pass for it on every clip.

Slide notes

Reviewing motion has two layers, and teams new to this skip the first one. The composition review is a code-and-content review: does every on-screen word match the brief, do the durations match the shot list, do colours and type come from tokens, is there anything on the same track that overlaps. Both stacks give the agent checks it can run on its own output before you look — Hyperframes ships lint and inspect commands, and on the Remotion side the type checker plus a scrub through the Studio covers similar ground. Make running those checks part of the brief, not a favour you ask afterwards.

The frame review catches the failures that are invisible in code. The most instructive one from the project behind the related article is the dead animation: a GSAP timeline that is registered but never advanced, so the clip renders as a still image for its full duration. It passed code review and only showed up when early and late frames were actually compared — the project's audit found the defect in 8 of its 190 templates. That is why a frame-comparison step belongs in the gate, not just a play-through.

Then the judgment layer. Agents reliably produce motion that is technically correct and slightly too fast, because nothing in lint measures whether a human had time to read the headline. Pacing, emphasis, and whether the clip actually lands its message are human calls, the same way hierarchy and tone are in a UI review. Accessibility closes the list: reduced-motion behaviour, room for captions, and contrast at the final rendered size, checked before the expensive render rather than after.

Narration for this slide

A clip that plays smoothly can still be wrong, so review in layers. First the composition: every word on screen matches the brief, durations match the shot list, nothing overlaps, colours come from tokens. The agent can run lint and inspection on its own output — make that part of the brief. Then the frames: compare early and late frames of every scene, because the sneakiest failure is the dead animation — a timeline that is registered but never advances, so your animated clip is actually a still image. And then the human layer: pacing. Agents produce motion that is technically correct and slightly too fast. No lint rule measures whether a person had time to read the headline. That pass is yours.

Slide 11 of 1416:9

Render and export: formats, sizes, and where the output goes

Rendering is the slow, expensive step. The gates exist so that bad compositions never reach it.

  • Both stacks render from one command: npx remotion render or npx hyperframes render
  • Local renders take minutes, not seconds — budget the time, render drafts at lower quality
  • Remotion adds server-side and distributed Lambda rendering for volume; Hyperframes is single-machine today
  • Outputs: MP4 as the default; Remotion also covers WebM, ProRes, GIF, and image sequences
  • Decide the destination spec up front: aspect ratio, resolution, codec, captions, file-size ceiling
  • The MP4 lands in the repository or asset store as a build output — the composition stays the source of truth

Catch problems with lint, inspection, and a preview pass before rendering. A render queue full of rejected MP4s is the motion equivalent of reviewing pull requests by deploying them.

Slide notes

For a single short clip on a laptop, the two stacks feel the same at render time: one command drives a headless browser and FFmpeg, and the result takes minutes rather than seconds. That is the practical reason the review gates sit before the render — iterate on lint output, inspected frames, and the hot-reload preview, render drafts at lower quality settings while the composition is still moving, and reserve the full-quality render for a composition that has already passed human review.

The stacks diverge at volume. Remotion's frame-pure model lets it split a render across many short-lived cloud functions with Lambda, which makes per-customer personalisation and large batches practical — and it is also exactly where the per-render pricing on the paid licence applies, so the teams that benefit most are the teams doing the maths on it. Hyperframes renders on one machine as of June 2026; for release clips and walkthroughs that is a non-issue, for thousands of personalised videos it is the wrong tool right now. Output formats follow the same pattern: both produce MP4; Remotion covers WebM, ProRes, GIF, and image sequences natively, while a Hyperframes pipeline typically adds an FFmpeg pass for anything beyond MP4 — the traced project also uses an FFmpeg pass to normalise resolution, a reminder that pre-1.0 tools have sharp edges worth pinning versions against.

Finally, destination discipline. Decide where the clip plays before briefing it — release notes page, social, in-product — because aspect ratio, resolution, captions, and file-size ceilings differ by destination, and retrofitting a 16:9 composition into a 9:16 cut is a redesign, not an export setting. The rendered file goes wherever your assets live; the composition stays in the repository as the thing your team maintains.

Narration for this slide

Rendering is the step that costs real time, so treat it as the end of the loop, not the middle. Both stacks render from a single command, and on a laptop a short clip takes minutes — which is exactly why you iterate on the preview and the inspected frames, not on finished MP4s. At volume the stacks part ways: Remotion can distribute a render across cloud functions, which is also where its per-render pricing applies; Hyperframes renders on one machine. Decide the destination before you brief — aspect ratio, resolution, captions, file size — because changing those after the fact is a redesign, not an export setting. And remember what gets maintained: the composition is the source of truth. The MP4 is just a build output.

Slide 12 of 1416:9

Worked example: a thirty-second product update video, built by brief

A traced run from a first-party Hyperframes pipeline: four shots, written brief, agent-generated compositions, and the gates that caught the failures.

StageWhat happened
Brief4 shots, ~30 seconds, 1920×1080 — on-screen text written out, durations summing to total, tokens named
Generated4 shot compositions plus a root timeline, reusing the project's theme CSS variables
Gate catch 1Lint flagged a timing overlap: shot 2 started before shot 1's exit finished on the same track
Gate catch 2One shot inherited a dead animation from its template — timeline registered but never advanced
Iterations3 composition passes, roughly 45–60 minutes brief-to-approval (labelled estimate)
RenderLocal CLI render plus an FFmpeg pass to normalise resolution — a pre-1.0 sharp edge, fixed in the pipeline, not by hand

Both failures were fixed by editing the composition. No video file was ever edited — that discipline is the entire economic argument for motion as code.

Slide notes

This trace comes from the first-party project documented in the school's motion article: a Hyperframes-based pipeline that turns written material into rendered explainer videos. Be transparent about its status the way the article is — the prompts, composition structure, and gate output are working material from that project, the iteration and time figures are labelled estimates reconstructed from its history rather than a fresh benchmark, and the render figures come from the project's existing local renders.

Walk the trace in order. The brief was written as a shot plan: an opening title card, two claim-plus-visual shots, a closing card, each with explicit durations and the on-screen words written out. The agent produced one composition file per shot plus a root timeline, and pulled colours and type from the project's existing theme stylesheet rather than inventing them — because the brief told it to and the project structure made the tokens findable. The first pass through the gates caught two real defects: a timing overlap that lint flagged, and a dead animation inherited from a library template, which only the frame-comparison audit caught — the same defect that audit had previously found in 8 of the project's 190 templates.

The numbers to leave people with: three composition passes and roughly an hour from brief to approved composition, a render measured in minutes, and an FFmpeg step in the pipeline to normalise resolution because the pinned pre-1.0 CLI rendered at double scale. Every correction in the trace was an edit to a text file. Nothing was fixed in a video editor, and that is the point of the whole module.

Narration for this slide

Let's trace a real one: a thirty-second product update video from a working Hyperframes pipeline. The brief was a shot plan — four shots, durations adding up, every on-screen word written out, tokens named. The agent generated four shot files and a root timeline, pulling colours and type from the project's theme. The gates earned their keep twice. Lint caught a timing overlap between the first two shots. And the frame audit caught a dead animation — a timeline that was registered but never advanced, inherited from a template. Three passes, roughly an hour from brief to approved composition, then a render in minutes plus one FFmpeg clean-up step. Here is the detail that matters most: every fix was an edit to a text file. Nobody ever opened a video editor.

Slide 13 of 1416:9

Exercise: storyboard one scene and write its brief

Pick one scene from a video your team actually needs — a title card, a feature claim, a closing card — and brief it properly. Do not build it yet.

  • Sketch the scene as a rough storyboard frame: what is on screen at the start, middle, and end
  • Write the on-screen text in full — every word that will appear
  • Set the timing: scene duration, when each element enters and exits
  • State the hierarchy and emphasis: what is read first, what moves, what holds still
  • Add the constraints: tokens or stylesheet to use, easing personality, reduced-motion behaviour, no invented copy
  • Note which checks the agent must run before calling it ready

Keep the brief. Module 3 turns written scripts into narrated explainers, and this scene becomes one beat of that larger piece.

Slide notes

Keep the exercise to one scene, not a whole clip — the skill being practised is the brief, and one well-briefed scene teaches more than four vague ones. Steer participants towards a scene from a video their team genuinely keeps postponing: the title card of a release announcement, a single feature claim with a supporting visual, the closing call to action. The storyboard frame can be three rough rectangles on paper; its job is to force decisions about what is actually on screen, not to be drawable.

The parts people skip are the ones to insist on. Writing the on-screen text in full feels tedious and is the single biggest quality lever in the whole workflow. Setting durations that sum correctly forces the pacing conversation before the agent has the chance to default to too fast. The hierarchy line — read first, moves, holds still — is the design judgment the agent cannot supply.

If participants want to run the brief through an agent afterwards, the lowest-friction path is the hyperframe-style one from this module: install the skill pack, point the agent at the brief and the tokens, and ask for lint and frame inspection before any preview. But the exercise is complete without running anything. The brief is the artifact, and Module 3 picks it up: a narrated explainer is a sequence of exactly these scene briefs, hung off a script.

Narration for this slide

Your turn. Pick one scene from a video your team actually needs and keeps postponing — a title card, one feature claim, a closing card. Storyboard it roughly: what is on screen at the start, the middle, and the end. Then write the brief properly. Every on-screen word, written out. The duration, and when each element enters and leaves. The hierarchy — what gets read first, what moves, what holds still. The constraints: which tokens, what easing, what happens under reduced motion, and no invented copy. And name the checks the agent has to run before it tells you the scene is ready. Don't build it yet. Keep the brief — Module 3 will turn scenes like this into a narrated explainer.

Slide 14 of 1416:9

Summary, and what comes next

  • Remotion: video as React components, frame-pure animation, rendering that scales to distributed Lambda — source-available, paid above three employees as of June 2026
  • Hyperframes: video as plain HTML with timing attributes, seek-based animation runtimes, single-machine rendering — Apache 2.0, pre-1.0 and fast-moving
  • Either way, the composition is diffable text fed by your tokens; the MP4 is a build output
  • Brief in design language — timing, hierarchy, emphasis, exact words — and let the agent translate
  • Review in layers: lint and frame inspection for the mechanical failures, a human pass for pacing

Module 3 moves from single clips to narrated explainers and decks: the script becomes the primary artifact, and slides, narration, and video are generated from the same source.

Slide notes

Recap along the structure of the module. The two stacks answer the same question with different materials: Remotion computes every frame from a frame number inside React components and scales its rendering further than anything else in this space; Hyperframes treats the composition as an HTML document, seeks existing animation runtimes frame by frame, and has the lowest floor for agent-first authoring. The licensing and maturity facts are the ones to re-verify on the day a team commits — they were accurate as of June 2026 and both projects move.

Then the parts that are stack-independent, because they are the durable skills: a project structured like a small design system, with scenes as components, content as data, and the brand arriving through tokens; briefs written in timing, hierarchy, emphasis, and exact words; reviews layered from automated checks through frame comparison to the human pacing pass; and rendering treated as the expensive final step that only approved compositions reach.

Preview Module 3 concretely. The next module changes the unit of work from a clip to a piece of communication: narrated explainers and decks generated from the same content source as the product or course material. The script becomes the primary artifact, slides and narration are produced from it rather than alongside it, and the editing pass keeps the result sounding like a person. The scene brief from this module's exercise becomes one beat of that script.

Narration for this slide

Let's close. Two stacks, one principle. Remotion gives you video as React components, frame-pure animation, and rendering that scales to distributed cloud functions — source-available, and paid above three employees as of mid 2026. Hyperframes gives you video as plain HTML, the web's own animation libraries seeked frame by frame, and the lowest-friction agent loop — Apache 2.0, but pre-1.0 and moving fast. Either way, the artifact is diffable text fed by your tokens, and the MP4 is just a build output. You brief in timing, hierarchy, and exact words; you review in layers; you render last. In Module 3 the unit of work grows: narrated explainers and decks, generated from a script that becomes the primary artifact. See you there.

Module transcript
Module 2, narrated slide by slide

Slide 1Two tools, one principle — motion as code

Welcome back. Module 1 made the argument that video defined in code is a different kind of artifact — one an agent can write, you can review, and your team can re-render when the product changes. This module is where we build it. We will look at the two stacks that currently have a credible agent story: Remotion, where a video is a set of React components, and Hyperframes, where a video is a plain HTML file with timing attributes. We will cover the project structure that keeps either maintainable, how to brief a scene in design language rather than tool operations, and how to review and render the result. Two tools, one principle: motion as code.

Slide 2Remotion in one slide

Here is Remotion in one idea: a video is a React component that gets evaluated once per frame. A Composition registers the video — its size, frame rate, and duration. Inside, the component asks for the current frame number and computes what the screen should look like at that instant, using helpers like interpolate and spring. Scenes are stacked with Sequence, each starting at an explicit frame. Because every frame is a pure function of that number, rendering is deterministic — and writing the video is a normal coding job, which is exactly why an agent can do it. The cost is that motion lives in Remotion's own vocabulary, not in libraries like GSAP. Hold that thought.

Slide 3Project structure an agent can navigate

Whichever stack you pick, the structure decides whether the second and fifth videos stay cheap. Treat the motion project like a small design system. One scene per file, named for what it shows. A root file that registers the compositions and puts the scenes in order. Content — the headlines, the claims, the figures — held as data, so a copy change never means digging through markup. Brand colours and type imported from tokens, so a rebrand is a re-render, not a re-edit. And an instruction file that records your duration defaults, your easing personality, and the rule that the agent never invents copy. The agent can only reuse what it can find.

Slide 4Hyperframe-style sequences: when a full video project is too much

Not every motion need justifies a full video project. Sometimes it is a ten-second loop or a short product clip, and for that, the hyperframe-style approach is the lighter answer. A composition is just an HTML file. Timing lives in data attributes on the elements — when each thing appears, for how long, on which track. The motion is a normal GSAP timeline, the same one a front-end developer would write for a landing page, and the renderer pauses it and seeks to each frame before capturing. No build step, a command-line loop built for agents, and a skill pack that teaches the agent the conventions. As of mid 2026 it is pre-1.0 and renders on one machine — but for short, recurring clips, it is the lowest-friction path from brief to MP4.

Slide 5The two stacks, side by side

Here are the two stacks side by side. On the left, Remotion: React components, frames computed from a frame number, motion in interpolate and spring, rendering that scales from your laptop to distributed cloud functions. On the right, Hyperframes: plain HTML with timing in data attributes, motion in GSAP or Lottie or CSS seeked frame by frame, rendered with headless Chrome on one machine. The fit lines are honest — React teams and render volume on the left, agent-first authoring and GSAP-heavy motion on the right. But look at the band underneath, because that is the actual story. Both give you a diffable composition in your repository, styled from your tokens, written by an agent, and reviewed through the gates you already have. The columns decide the vocabulary. The band is the principle.

Slide 6Choosing between them: the facts that decide it

When you have to choose, these are the rows that decide it. Authoring and animation are about vocabulary: React and frame-pure helpers on one side, HTML and the web's existing animation libraries on the other. Rendering is about scale: Remotion can distribute a render across cloud functions, Hyperframes runs on one machine. And then licensing — the row teams discover too late. As of June 2026, Remotion is source-available: free up to three employees, a paid company licence above that. Hyperframes describes itself as Apache 2.0 with no thresholds, but it is pre-1.0 and moving fast, so pin your version and read the licence file. If you genuinely cannot decide, build the same thirty-second clip in both. It takes an afternoon.

Slide 7Briefing a scene: timing, hierarchy, emphasis

Here is the skill that stays yours: briefing a scene. A good motion brief is a shot list with intent. Every shot gets a duration, and the durations add up to the length of the clip. Every piece of on-screen text is written out, word for word — the moment you write something punchy here, the agent invents copy you will spend the rest of the session removing. Then hierarchy: what the viewer reads first, what supports it, what holds still. Then emphasis and pacing — how long a headline needs to stay readable. And constraints: your easing, reduced motion, no invented claims. Notice the brief never mentions a tool. Timing, hierarchy, emphasis. The agent does the translation into frames.

Slide 8What the agent writes from that brief

Here is what the agent actually writes from that brief — the same six-second title card in both stacks. In the Hyperframes version, look at the markup: the shot lasts six seconds, the kicker appears at zero point two, and the headline copy is sitting right there for you to check against the brief. In the Remotion version, the same scene is frame arithmetic: a Sequence that starts at frame zero, springs that lift the headline lines a few frames apart. You are not expected to write either of these. You are expected to read them — to check the words, the durations, and that the colours come from tokens. That is what reviewing motion as code looks like.

Slide 9Design tokens and brand assets inside motion projects

The brand should reach the video exactly the way it reaches the product: through tokens. Colours, type, and spacing come from the token source or one shared stylesheet. Logos, product shots, and music live in a named assets folder the agent knows about. And add motion tokens — standard durations, easing curves, stagger values — so ten clips made across six months still feel like one brand. Keep the per-clip content as variables, so the same scene template serves many videos. The payoff is simple: when the brand changes, a token-driven project re-renders. A hard-coded one becomes a folder of stale MP4s with your old logo in them.

Slide 10Reviewing motion: beyond "does it play"

A clip that plays smoothly can still be wrong, so review in layers. First the composition: every word on screen matches the brief, durations match the shot list, nothing overlaps, colours come from tokens. The agent can run lint and inspection on its own output — make that part of the brief. Then the frames: compare early and late frames of every scene, because the sneakiest failure is the dead animation — a timeline that is registered but never advances, so your animated clip is actually a still image. And then the human layer: pacing. Agents produce motion that is technically correct and slightly too fast. No lint rule measures whether a person had time to read the headline. That pass is yours.

Slide 11Render and export: formats, sizes, and where the output goes

Rendering is the step that costs real time, so treat it as the end of the loop, not the middle. Both stacks render from a single command, and on a laptop a short clip takes minutes — which is exactly why you iterate on the preview and the inspected frames, not on finished MP4s. At volume the stacks part ways: Remotion can distribute a render across cloud functions, which is also where its per-render pricing applies; Hyperframes renders on one machine. Decide the destination before you brief — aspect ratio, resolution, captions, file size — because changing those after the fact is a redesign, not an export setting. And remember what gets maintained: the composition is the source of truth. The MP4 is just a build output.

Slide 12Worked example: a thirty-second product update video, built by brief

Let's trace a real one: a thirty-second product update video from a working Hyperframes pipeline. The brief was a shot plan — four shots, durations adding up, every on-screen word written out, tokens named. The agent generated four shot files and a root timeline, pulling colours and type from the project's theme. The gates earned their keep twice. Lint caught a timing overlap between the first two shots. And the frame audit caught a dead animation — a timeline that was registered but never advanced, inherited from a template. Three passes, roughly an hour from brief to approved composition, then a render in minutes plus one FFmpeg clean-up step. Here is the detail that matters most: every fix was an edit to a text file. Nobody ever opened a video editor.

Slide 13Exercise: storyboard one scene and write its brief

Your turn. Pick one scene from a video your team actually needs and keeps postponing — a title card, one feature claim, a closing card. Storyboard it roughly: what is on screen at the start, the middle, and the end. Then write the brief properly. Every on-screen word, written out. The duration, and when each element enters and leaves. The hierarchy — what gets read first, what moves, what holds still. The constraints: which tokens, what easing, what happens under reduced motion, and no invented copy. And name the checks the agent has to run before it tells you the scene is ready. Don't build it yet. Keep the brief — Module 3 will turn scenes like this into a narrated explainer.

Slide 14Summary, and what comes next

Let's close. Two stacks, one principle. Remotion gives you video as React components, frame-pure animation, and rendering that scales to distributed cloud functions — source-available, and paid above three employees as of mid 2026. Hyperframes gives you video as plain HTML, the web's own animation libraries seeked frame by frame, and the lowest-friction agent loop — Apache 2.0, but pre-1.0 and moving fast. Either way, the artifact is diffable text fed by your tokens, and the MP4 is just a build output. You brief in timing, hierarchy, and exact words; you review in layers; you render last. In Module 3 the unit of work grows: narrated explainers and decks, generated from a script that becomes the primary artifact. See you there.