Slide 1 — Two tools, one principle — motion as code
Welcome back. Module 1 made the argument that video defined in code is a different kind of artifact — one an agent can write, you can review, and your team can re-render when the product changes. This module is where we build it. We will look at the two stacks that currently have a credible agent story: Remotion, where a video is a set of React components, and Hyperframes, where a video is a plain HTML file with timing attributes. We will cover the project structure that keeps either maintainable, how to brief a scene in design language rather than tool operations, and how to review and render the result. Two tools, one principle: motion as code.
Slide 2 — Remotion in one slide
Here is Remotion in one idea: a video is a React component that gets evaluated once per frame. A Composition registers the video — its size, frame rate, and duration. Inside, the component asks for the current frame number and computes what the screen should look like at that instant, using helpers like interpolate and spring. Scenes are stacked with Sequence, each starting at an explicit frame. Because every frame is a pure function of that number, rendering is deterministic — and writing the video is a normal coding job, which is exactly why an agent can do it. The cost is that motion lives in Remotion's own vocabulary, not in libraries like GSAP. Hold that thought.
Slide 3 — Project structure an agent can navigate
Whichever stack you pick, the structure decides whether the second and fifth videos stay cheap. Treat the motion project like a small design system. One scene per file, named for what it shows. A root file that registers the compositions and puts the scenes in order. Content — the headlines, the claims, the figures — held as data, so a copy change never means digging through markup. Brand colours and type imported from tokens, so a rebrand is a re-render, not a re-edit. And an instruction file that records your duration defaults, your easing personality, and the rule that the agent never invents copy. The agent can only reuse what it can find.
Slide 4 — Hyperframe-style sequences: when a full video project is too much
Not every motion need justifies a full video project. Sometimes it is a ten-second loop or a short product clip, and for that, the hyperframe-style approach is the lighter answer. A composition is just an HTML file. Timing lives in data attributes on the elements — when each thing appears, for how long, on which track. The motion is a normal GSAP timeline, the same one a front-end developer would write for a landing page, and the renderer pauses it and seeks to each frame before capturing. No build step, a command-line loop built for agents, and a skill pack that teaches the agent the conventions. As of mid 2026 it is pre-1.0 and renders on one machine — but for short, recurring clips, it is the lowest-friction path from brief to MP4.
Slide 5 — The two stacks, side by side
Here are the two stacks side by side. On the left, Remotion: React components, frames computed from a frame number, motion in interpolate and spring, rendering that scales from your laptop to distributed cloud functions. On the right, Hyperframes: plain HTML with timing in data attributes, motion in GSAP or Lottie or CSS seeked frame by frame, rendered with headless Chrome on one machine. The fit lines are honest — React teams and render volume on the left, agent-first authoring and GSAP-heavy motion on the right. But look at the band underneath, because that is the actual story. Both give you a diffable composition in your repository, styled from your tokens, written by an agent, and reviewed through the gates you already have. The columns decide the vocabulary. The band is the principle.
Slide 6 — Choosing between them: the facts that decide it
When you have to choose, these are the rows that decide it. Authoring and animation are about vocabulary: React and frame-pure helpers on one side, HTML and the web's existing animation libraries on the other. Rendering is about scale: Remotion can distribute a render across cloud functions, Hyperframes runs on one machine. And then licensing — the row teams discover too late. As of June 2026, Remotion is source-available: free up to three employees, a paid company licence above that. Hyperframes describes itself as Apache 2.0 with no thresholds, but it is pre-1.0 and moving fast, so pin your version and read the licence file. If you genuinely cannot decide, build the same thirty-second clip in both. It takes an afternoon.
Slide 7 — Briefing a scene: timing, hierarchy, emphasis
Here is the skill that stays yours: briefing a scene. A good motion brief is a shot list with intent. Every shot gets a duration, and the durations add up to the length of the clip. Every piece of on-screen text is written out, word for word — the moment you write something punchy here, the agent invents copy you will spend the rest of the session removing. Then hierarchy: what the viewer reads first, what supports it, what holds still. Then emphasis and pacing — how long a headline needs to stay readable. And constraints: your easing, reduced motion, no invented claims. Notice the brief never mentions a tool. Timing, hierarchy, emphasis. The agent does the translation into frames.
Slide 8 — What the agent writes from that brief
Here is what the agent actually writes from that brief — the same six-second title card in both stacks. In the Hyperframes version, look at the markup: the shot lasts six seconds, the kicker appears at zero point two, and the headline copy is sitting right there for you to check against the brief. In the Remotion version, the same scene is frame arithmetic: a Sequence that starts at frame zero, springs that lift the headline lines a few frames apart. You are not expected to write either of these. You are expected to read them — to check the words, the durations, and that the colours come from tokens. That is what reviewing motion as code looks like.
Slide 9 — Design tokens and brand assets inside motion projects
The brand should reach the video exactly the way it reaches the product: through tokens. Colours, type, and spacing come from the token source or one shared stylesheet. Logos, product shots, and music live in a named assets folder the agent knows about. And add motion tokens — standard durations, easing curves, stagger values — so ten clips made across six months still feel like one brand. Keep the per-clip content as variables, so the same scene template serves many videos. The payoff is simple: when the brand changes, a token-driven project re-renders. A hard-coded one becomes a folder of stale MP4s with your old logo in them.
Slide 10 — Reviewing motion: beyond "does it play"
A clip that plays smoothly can still be wrong, so review in layers. First the composition: every word on screen matches the brief, durations match the shot list, nothing overlaps, colours come from tokens. The agent can run lint and inspection on its own output — make that part of the brief. Then the frames: compare early and late frames of every scene, because the sneakiest failure is the dead animation — a timeline that is registered but never advances, so your animated clip is actually a still image. And then the human layer: pacing. Agents produce motion that is technically correct and slightly too fast. No lint rule measures whether a person had time to read the headline. That pass is yours.
Slide 11 — Render and export: formats, sizes, and where the output goes
Rendering is the step that costs real time, so treat it as the end of the loop, not the middle. Both stacks render from a single command, and on a laptop a short clip takes minutes — which is exactly why you iterate on the preview and the inspected frames, not on finished MP4s. At volume the stacks part ways: Remotion can distribute a render across cloud functions, which is also where its per-render pricing applies; Hyperframes renders on one machine. Decide the destination before you brief — aspect ratio, resolution, captions, file size — because changing those after the fact is a redesign, not an export setting. And remember what gets maintained: the composition is the source of truth. The MP4 is just a build output.
Slide 12 — Worked example: a thirty-second product update video, built by brief
Let's trace a real one: a thirty-second product update video from a working Hyperframes pipeline. The brief was a shot plan — four shots, durations adding up, every on-screen word written out, tokens named. The agent generated four shot files and a root timeline, pulling colours and type from the project's theme. The gates earned their keep twice. Lint caught a timing overlap between the first two shots. And the frame audit caught a dead animation — a timeline that was registered but never advanced, inherited from a template. Three passes, roughly an hour from brief to approved composition, then a render in minutes plus one FFmpeg clean-up step. Here is the detail that matters most: every fix was an edit to a text file. Nobody ever opened a video editor.
Slide 13 — Exercise: storyboard one scene and write its brief
Your turn. Pick one scene from a video your team actually needs and keeps postponing — a title card, one feature claim, a closing card. Storyboard it roughly: what is on screen at the start, the middle, and the end. Then write the brief properly. Every on-screen word, written out. The duration, and when each element enters and leaves. The hierarchy — what gets read first, what moves, what holds still. The constraints: which tokens, what easing, what happens under reduced motion, and no invented copy. And name the checks the agent has to run before it tells you the scene is ready. Don't build it yet. Keep the brief — Module 3 will turn scenes like this into a narrated explainer.
Slide 14 — Summary, and what comes next
Let's close. Two stacks, one principle. Remotion gives you video as React components, frame-pure animation, and rendering that scales to distributed cloud functions — source-available, and paid above three employees as of mid 2026. Hyperframes gives you video as plain HTML, the web's own animation libraries seeked frame by frame, and the lowest-friction agent loop — Apache 2.0, but pre-1.0 and moving fast. Either way, the artifact is diffable text fed by your tokens, and the MP4 is just a build output. You brief in timing, hierarchy, and exact words; you review in layers; you render last. In Module 3 the unit of work grows: narrated explainers and decks, generated from a script that becomes the primary artifact. See you there.