AAgentic Design School
Module 1 of 5
35–45 minutes

Motion and Storytelling with Agents

Video as an Agent Output

What changes when video is defined in code rather than assembled on a timeline: diffable, regenerable, data-driven motion that an agent can produce and revise — and the kinds of video work where this approach genuinely beats traditional tools.

Duration35–45 minutes

Slides12 slides with notes and narration

Learning objectives

  • Explain code-defined video and why it suits agent workflows.
  • Identify which video tasks fit the approach: product explainers, data stories, repeatable formats.
  • Recognise the tasks where timeline tools and motion specialists still win.
Slide deck

Work through the module

Each slide is shown in its 16:9 frame, exactly as it appears in the video version. Open the notes under any slide for the longer explanation, and the narration if you prefer to read along.

Slide 1 of 1216:9

Video as an Agent Output

Motion and Storytelling with Agents · Module 1 of 5

  • Why timeline tools lock agents out, and what code-defined video changes
  • The formats where the approach genuinely wins — and the ones where it does not
  • What a revision actually costs in each model
  • The designer's role: script, structure, and judgment over keyframes

Video joins the diffable artifacts. This module is the argument for why; the rest of the course is the practice.

Slide notes

This is the framing module for the whole course, so resist the temptation to open a Remotion or Hyperframes project here. The job is to establish one idea clearly: when a video is defined as code rather than assembled on a timeline, it becomes the same kind of artifact as a component or a token file — readable, diffable, regenerable — and that single property is what lets an agent produce and revise it inside the loop the team already runs.

Name the audience honestly. This course is for product designers, brand and content designers, and design technologists who need motion output regularly but cannot justify a dedicated motion team. It is also useful to motion designers who want to see what changes in their own pipeline. It is not a claim that agents replace motion craft — Module 1 spends a full slide on the work that still belongs with specialists.

Set the prerequisite expectation: the course assumes the working vocabulary from Agentic Design Fundamentals — brief, harness, gate, the designer-agent loop. If participants do not have that, the module still works, but the phrase 'same review loop as any other artifact' carries less weight.

Narration for this slide

Welcome to Motion and Storytelling with Agents. This first module is about a single shift: what happens when video stops being something you assemble on a timeline and becomes something you define in code. That sounds like an engineering detail. It is not. It decides whether an agent can produce video for you at all, what a revision costs, and whether your motion work stays current as the product changes. We will look at where this approach genuinely beats traditional tools, where it clearly does not, and what your role becomes when the keyframes are no longer yours to drag. Let's start with the tools most teams reach for today.

Slide 2 of 1216:9

Timeline tools vs code-defined video: what each is for

Neither is wrong. They produce different kinds of artifact, and the artifact decides who — and what — can work on it.

Timeline tools (Premiere, After Effects)Code-defined video (Remotion, Hyperframes)
The artifactA binary project file plus an exported MP4A text composition in the repository; the MP4 is a build output
Who can edit itWhoever has the tool, the file, and the fontsAnyone who can edit text — including a coding agent
Design systemColours and type re-entered by handTokens and theme files consumed directly
When the product changesReopen, re-edit, re-exportEdit the composition, re-render
ReviewWatch the export and commentDiff the composition, run checks, then watch

A timeline edit produces a snapshot. A composition produces a living artifact that re-renders when the product moves.

Slide notes

Keep this comparison about the artifact, not about craft. After Effects and Premiere are superb instruments in the hands of motion designers, and nothing in this course argues otherwise. The structural problem for product teams is what the tools leave behind: a binary project file that only the original editor can realistically reopen, and an exported video that goes stale the moment the button moves, the pricing changes, or the brand refreshes.

The right-hand column describes a different kind of object. A Remotion composition is a React component; a Hyperframes composition is a plain HTML file with timing attributes. Both live in the repository next to the product, both can pull colours and type from the same token source the product uses, and both re-render from the terminal when something changes. The video file itself stops being the thing you maintain.

The row that matters most for this course is 'who can edit it'. GUI timeline tools require GUI interaction, which an agent cannot perform — there is nothing for it to read and nothing for it to change. Text compositions are exactly the material coding agents are already good at. That is the entire reason video is becoming an agent output rather than remaining a specialist silo.

Narration for this slide

Here is the comparison that frames everything else. Timeline tools produce a binary project file and an exported video. Both are fine objects — until the product changes, at which point someone has to find the file, reopen it, re-edit, and re-export. Code-defined video produces a different artifact: a text composition that lives in your repository, uses your design tokens, and re-renders from one terminal command. The MP4 becomes a build output, not the thing you maintain. And notice the row about who can edit it. A timeline needs hands on a GUI. A composition is text — which means a coding agent can write it, and a reviewer can diff it.

Slide 3 of 1216:9

Why agents and code-defined video fit

The same loop that produces components from a brief now produces motion from a brief: read, generate, revise, re-render.

  • Read — the agent reads your tokens, theme files, and existing compositions before writing anything
  • Generate — the composition is text: HTML with timing attributes, or a React component driven by the frame number
  • Revise — feedback edits the composition, never the rendered video
  • Re-render — the same composition produces the same frames, deterministically, on demand

Traditional video tools require GUI interaction an agent cannot perform. A composition is code, and code is what agents already work on.

Slide notes

Walk the four verbs and tie each back to the loop participants already know from UI work. Read is why the output can match the brand: the agent sees the actual token names and the actual theme stylesheet rather than guessing hex values. Generate is the production step — and the reason both major frameworks ship installable agent skill packs is that composition-writing is a learnable, procedural task. Revise is the discipline that makes the economics work: when pacing is wrong or a headline changes, you edit the composition and re-render; you never open the MP4 in an editor. Re-render is the property that keeps motion current — the rendered file is reproducible from source, the way a build is.

Determinism deserves a sentence of precision. Remotion makes every frame a pure function of the frame number. Hyperframes captures frames from a real browser page whose animation runtime is paused and seeked to the exact frame time. Different mechanisms, same outcome: the same composition yields the same frames, which is what makes review and regeneration trustworthy.

Also name what this is not: it is not text-to-video generation. The agent is writing structured, reviewable code against your design system, not hallucinating pixels. That distinction matters for brand control and it matters for review.

Narration for this slide

So why do agents and code-defined video fit so well together? Because the workflow is the one you already run. The agent reads — your tokens, your theme files, your existing compositions. It generates — a composition that is just text, either HTML with timing attributes or a React component driven by the frame number. You give feedback, and it revises — by editing the composition, never by touching a rendered file. Then it re-renders, and the same composition produces the same frames every time. Compare that with a timeline tool, where the work happens through clicks and drags an agent simply cannot perform. This is not text-to-video generation. It is the agent writing reviewable code against your design system — and that is exactly what it is already good at.

Slide 4 of 1216:9

Two paths to the same MP4

The GUI path blocks the agent at the door. The code path makes video another artifact in the existing review loop.

Two-lane diagram comparing video production paths. The top lane shows the GUI timeline path: a binary project file, a hand-edited timeline of clicks, drags, and keyframes, and an exported MP4 that goes stale, with a panel noting the agent is blocked because there is nothing it can read, change, or check. The bottom lane shows the code-defined path: a human-written brief and script, an agent-written composition stored as text in the repository using design tokens, a review gate of diffs, lint, and frame inspection, and a deterministic render producing the MP4 as a build output, with a dashed feedback line labelled edit the composition, never the video.
Top lane: the GUI timeline path ends in a stale export, and the agent is locked out at every step. Bottom lane: brief and script in, composition as code, review gate, deterministic render out — the same loop the team already runs for components, with the MP4 as a build output.

Same destination, different artifact. The bottom lane is the only one an agent can work in — and the only one a reviewer can diff.

Slide notes

Walk the top lane first because it describes what most teams do today. The project file is opaque to anyone but the tool; revisions are specialist time; review means watching the export and writing comments; and the moment the product changes, the video is a snapshot of something that no longer exists. The agent is blocked at every step, not because models lack capability but because there is no text to read, no file it can meaningfully change, and no check it can run.

Then the bottom lane, left to right, naming who owns each step. The brief and script are human — shots, durations, and the exact on-screen words, because vague mood briefs produce placeholder enthusiasm. The composition is agent-written text, in the repository, consuming the product's tokens. The review gate is the same shape as any artifact review: diff the composition, run lint and frame inspection, look at stills, judge pacing — and only then spend time on the slow render. The render is deterministic, so the MP4 is a reproducible build output.

The dashed feedback line is the discipline to land hardest: edit the composition, never the video. That single habit is what makes programmatic motion cheaper than traditional motion over the life of a product, and it only holds if the team treats the composition, not the MP4, as the deliverable.

Narration for this slide

Here are the two paths side by side. The top lane is the familiar one: a binary project file, a hand-edited timeline, an exported video that starts going stale on the day it ships. The agent is blocked at every step — there is nothing it can read, nothing it can change, no check it can run. The bottom lane is the code-defined path. You write the brief and the script. The agent writes the composition — text, in your repository, using your tokens. It passes a review gate: diffs, lint, frame checks, a pacing pass. Then a deterministic render produces the MP4 as a build output. And the dashed line is the rule that makes the economics work: when something is wrong, you edit the composition. You never edit the video.

Slide 5 of 1216:9

The formats that benefit

The approach pays off where video recurs, follows a structure, and tracks a product that keeps changing.

  • Product explainers and feature clips — 20 to 40 seconds, released with every notable change
  • Release notes as video — the same template re-rendered with new content each cycle
  • Data stories — charts and numbers animated from a data file, regenerated when the data updates
  • Social cuts and aspect-ratio variants — one composition, several output sizes
  • Animated specs — showing engineering how a transition should feel, in the product's own tokens

The common thread is reuse: a composition that renders once is a curiosity; one that re-renders every release is an asset.

Slide notes

The economic argument for code-defined video rests on amortisation, so every format on this slide shares the same shape: it recurs, it follows a structure, and it goes stale when the product changes. Feature clips and release-note videos are the clearest case — most product teams want them, few are staffed to produce them, and the content is mostly on-screen text, product UI, and brand-consistent motion, which is exactly what a composition built on templates and tokens does well.

Data stories deserve emphasis because they are where the 'data-driven' part of the module summary becomes literal. When the chart is drawn by code from a data file, updating the video means updating the data and re-rendering — there is no keyframe surgery. The same logic covers personalised or localised variants: one composition, many renders with different props or variables.

Social cuts and aspect-ratio variants are a quieter win. A timeline edit at 16:9 has to be substantially re-laid-out for 9:16; a composition can be parameterised for both. Animated specs — short clips that show engineering how a transition should feel — are the format most product designers do not realise they want until they have one, because the alternative is describing easing in a comment thread.

Narration for this slide

So where does this approach genuinely win? Wherever video recurs and tracks a moving product. Feature clips and release-note videos — twenty to forty seconds, shipped every cycle, mostly on-screen text and product UI. Data stories, where the chart is drawn from a data file, so updating the video means updating the data and re-rendering. Social cuts, where one composition renders out at several aspect ratios instead of being re-laid-out by hand. And animated specs — short clips that show engineering exactly how a transition should feel, built from the same tokens the product uses. The common thread is reuse. A composition that renders once is a curiosity. One that re-renders every release is an asset.

Slide 6 of 1216:9

The formats that do not benefit

Being able to produce video as code does not mean every video should be produced that way.

  • Brand films and campaign hero pieces — taste-led, watched once, judged on craft you cannot lint
  • Character animation and heavy 3D — both frameworks can technically host it; you will fight them
  • Taste-led montage and edit-driven storytelling — rhythm cut to music is an editor's craft, not a template's
  • One-off internal demos — a screen recording is faster and honest about what it is
  • "Show the real product under real latency" — record the real product; a stylised reconstruction is less honest

If the value of the piece is craft, feel, or a one-time moment, brief a motion specialist. The pipeline is for the video work you currently postpone.

Slide notes

This slide protects the course's credibility, so deliver it without hedging. The economic argument from the previous slide cuts both ways: a one-off piece has nothing to amortise, and a piece whose entire value is craft will look like a developer's idea of cinema if you force it through a code pipeline. Brand films, campaign heroes, character work, and anything needing colour-grade-level finishing belong with motion specialists and their tools.

Two of the bullets are about honesty rather than craft. A one-off internal demo that will be watched twice is a screen recording — cheap, fast, and clear about what it is. And when the message is 'watch how the real product behaves', recording the real product is more truthful than a stylised reconstruction, however polished.

The useful framing for a team is a sorting question, not a tool debate: is this piece's value in its reuse and its accuracy to the product, or in its craft and its moment? The first kind goes through the pipeline this course builds. The second kind gets briefed to a specialist — and Module 5 covers what a good handover to that specialist looks like, because the two paths are complementary, not competing.

Narration for this slide

Now the honest half of the argument. Some video should not go through this pipeline. Brand films and campaign hero pieces are taste-led, watched once, and judged on craft no lint rule can measure. Character animation and heavy 3D will technically run in these frameworks, but you will fight the tools and the result will show it. Edit-driven storytelling — rhythm cut to music — is an editor's craft. And two cases are about honesty rather than craft: a one-off internal demo is better as a screen recording, and if the point is to show how the real product behaves, record the real product. The sorting question is simple. If the value is reuse and accuracy, use the pipeline. If the value is craft or a one-time moment, brief a specialist.

Slide 7 of 1216:9

Cost and iteration: what a revision actually costs

The headline saving is not the first version. It is every revision after it, for the life of the product.

Timeline workflowCode-defined workflow
First versionSpecialist days, or a contractor brief and a waitA written brief, an agent run, a review pass — typically under a day
Copy or pricing changeReopen the project, re-edit, re-exportEdit one line of the composition, re-render
Brand or token refreshManual re-theme of every affected videoTokens update at the source; affected clips re-render
New aspect ratioSubstantial re-layout per ratioParameterise the composition, render each size
Who does the revisionWhoever owns the project fileAnyone on the team, with the agent doing the edit

Timeline costs are roughly flat per revision. Composition costs are front-loaded, then each revision is an edit and a render.

Slide notes

Frame this as cost structure, not a race. A skilled motion designer can produce a far better first version of many pieces than a first agent run will. The structural difference is what happens afterwards: in the timeline model every revision costs roughly the same as the last, because each one is hands-on-tool time, and it queues behind whoever owns the project file. In the composition model the cost is front-loaded — writing the brief, setting up tokens and templates, establishing the gates — and then each revision is a small text edit plus a render.

Be honest about the front-loading. The first composition a team produces is not dramatically cheaper than commissioning the work, once you count the brief-writing, the gate setup, and the inevitable iteration on pacing. The second one is much cheaper, and the tenth is close to free apart from render time. That is also why the previous two slides matter: the formats that benefit are the recurring ones, because recurrence is what pays back the setup.

Keep the claims conservative and unbenchmarked. As of June 2026 there is no neutral published benchmark comparing these workflows end to end; the figures teams should trust are their own, which is exactly what the worked example and the exercise later in this module set up.

Narration for this slide

Let's talk about cost honestly, because the headline saving is not where people expect it. The first version is not the big win — writing a good brief, setting up tokens, and reviewing pacing takes real time, and a motion specialist may well produce a better first cut. The difference is every revision after that. In the timeline model, a copy change, a price update, or a brand refresh means reopening the project and re-exporting, every single time, by whoever owns the file. In the composition model, those same changes are a one-line edit and a render, and anyone on the team can ask the agent to make them. Flat cost per revision versus front-loaded setup and near-free revisions. That is the real comparison.

Slide 8 of 1216:9

The designer's role: script, structure, and judgment

When the keyframes are no longer yours to drag, your leverage moves upstream and downstream of the generation step.

  • Script — the exact on-screen words and narration; the single biggest quality lever in the workflow
  • Structure — shots, durations, hierarchy, and what each second is for
  • System — the tokens, motion durations, and templates the agent must draw from
  • Judgment — pacing, emphasis, and honesty; no automated gate measures whether a human had time to read the headline

An agent asked to "make it feel premium" fills the screen with placeholder enthusiasm. An agent given the words, the shots, and the tokens produces something reviewable.

Slide notes

This slide answers the question hanging over the module: if the agent writes the composition, what is the designer for? The answer mirrors the rest of agentic design work — leverage moves from production to intent and critique — but motion sharpens it, because the failure mode of a vague motion brief is so visible. Ask for 'something premium and energetic' and you get generic motion graphics with invented copy. Give the agent the actual headline, the shot list with durations, and the token source, and you get something worth reviewing.

The script is worth singling out as the artifact designers underinvest in. In this workflow the words on screen and in narration are written before any composition exists, and they are the part the audience actually consumes. Module 3 treats the script as a first-class artifact in its own right; plant that here.

Judgment covers what no gate measures. Lint catches timing overlaps; frame inspection catches text overflow and dead animations; nothing automated knows whether the pacing gives a viewer time to read, whether the emphasis lands on the right claim, or whether the clip is honest about what the product does. That pass is human, it is design work, and it is the part of the old craft that transfers entirely intact.

Narration for this slide

So what is your job, if you are not the one dragging keyframes? Four things. The script — the exact words on screen and in narration, written before any composition exists. That is the single biggest quality lever in this whole workflow. The structure — shots, durations, and what each second is for. The system — the tokens, motion durations, and templates the agent has to draw from instead of inventing its own. And judgment — the pass no automated check can do: is the pacing readable, is the emphasis on the right claim, is the clip honest? Ask an agent to make it feel premium and you get placeholder enthusiasm. Give it the words, the shots, and the tokens, and you get something you can genuinely review.

Slide 9 of 1216:9

The artifact discipline: the composition is the deliverable

This is the same shift the rest of agentic design went through: the deliverable becomes a text artifact the team reviews and keeps.

  • The composition lives in version control next to the product it describes
  • The MP4 is a build output — re-renderable, replaceable, never hand-edited
  • Date and version renders; an undated clip in a shared folder is a future argument
  • HTML artifacts are already becoming standard design deliverables; video compositions extend the same pattern

Treat the composition as the deliverable and the video stays current. Treat the MP4 as the deliverable and you have rebuilt the old problem with new tools.

Slide notes

This slide connects the module to a wider pattern the school has documented: a growing share of agent-assisted design work ships as self-contained HTML artifacts — prototypes, slides, review boards, motion studies — because agents are better at producing working HTML and CSS than at driving proprietary canvases, and a browser is the one renderer everyone has. A Hyperframes composition is literally that pattern applied to video: an HTML file that is both the editable source of truth and the thing the renderer captures. A Remotion composition is the same idea expressed in React.

The discipline that keeps the pattern healthy is the same one that keeps HTML prototypes from becoming unreviewable debt: generate against the design system, not alongside it; keep the artifact in version control; and let the decision, not the file, be the thing that graduates. For video specifically, that means the composition is committed, reviewed, and maintained, while renders are treated as build outputs that can be regenerated and should be dated.

The failure mode to warn about: a team adopts the tools, produces compositions, and then starts hand-tweaking exported MP4s in an editor because it feels faster in the moment. At that point the composition and the published video diverge, the next re-render silently reverts the tweak, and the team has rebuilt the staleness problem with extra steps.

Narration for this slide

One more idea before the worked example, and it is a discipline rather than a tool. The composition is the deliverable. It lives in version control next to the product, it gets reviewed like any other artifact, and the MP4 it produces is a build output — replaceable, re-renderable, never hand-edited. This is the same pattern the wider field is converging on: HTML artifacts are already becoming standard design deliverables, and a video composition is that pattern with a timeline attached. The failure mode is tempting and quiet: someone tweaks an exported MP4 by hand because it feels faster, the source and the published video diverge, and the next re-render silently undoes the fix. Hold the line. Edit the composition.

Slide 10 of 1216:9

Worked example: one feature clip, two ways

A 27-second feature announcement traced through the code-defined pipeline, set against what the same clip costs as a commissioned timeline edit.

Commissioned timeline editAgent-produced composition (traced run)
The briefA paragraph of intent and a links doc4 shots with durations, the exact on-screen words, the token source, gates named
ProductionSpecialist or contractor time, scheduledAgent wrote 4 shot files plus a root timeline, reusing the project's theme tokens
What review caughtComments on the exported draftLint caught a timing overlap; a frame check caught a dead animation inherited from a template
Time to approvedDays to weeks, mostly waiting3 composition passes, roughly 45–60 minutes brief-to-approval (labelled estimate)
The next revisionReopen, re-edit, re-exportEdit the composition, re-render; one FFmpeg pass to fix a CLI scaling quirk

The gates did the work a producer normally does: the timing overlap and the dead animation were caught before anyone watched a render.

Slide notes

The right-hand column comes from a real, first-party traced run documented in the school's article on Remotion and Hyperframes: a 27-second landscape feature announcement built as a Hyperframes composition — an opening title card, two claim-plus-visual shots, and a closing card. The brief specified shots, durations, and the exact on-screen text, named the design-system source, and told the agent to run lint and inspect before declaring the work ready. Say clearly that the iteration and time figures are labelled estimates reconstructed from that project's history, not a fresh benchmark, and that the left-hand column is a typical commissioning pattern rather than a measured control.

The two gate catches are the teaching point. Lint flagged a timing overlap where the second shot started before the title card's exit finished on the same track. A frame-comparison check caught a dead animation — a GSAP timeline registered but never advanced, so the 'animated' shot was a still image for its full duration — a defect invisible in code review and inherited from a template the project's own audit had already flagged. Both were fixed by editing compositions, not by touching any video file.

Also keep the friction in the story: the pinned pre-1.0 CLI rendered the 1080p viewport at double scale, so the pipeline adds an FFmpeg downscale pass. Pre-1.0 tooling has sharp edges; pin versions and expect the occasional surprise. That honesty is what makes the rest of the comparison credible.

Narration for this slide

Let's make this concrete with a real trace. The clip: a twenty-seven-second feature announcement — title card, two claim shots, closing card. The brief gave the agent four shots with durations, the exact on-screen words, and the token source, and told it to run the lint and inspect gates before calling it ready. The agent produced four shot files and a root timeline using the project's theme tokens. The gates earned their keep: lint caught a timing overlap, and a frame check caught a dead animation — a timeline registered but never advanced, so the shot was secretly a still image. Both fixed by editing the composition. Roughly forty-five to sixty minutes from brief to approval, as a labelled estimate — against the days to weeks a commissioned edit usually takes, mostly spent waiting.

Slide 11 of 1216:9

Exercise: the video work you keep postponing

No tools yet. List the recurring video needs your team has and does not meet, and sort them against this module's criteria.

  • List three recurring video needs your team postpones or skips — be specific about audience and where each would play
  • For each, note how often it would recur and what makes it go stale: copy, pricing, UI, data, brand
  • Sort each one: pipeline candidate, specialist brief, or honest screen recording
  • Pick the strongest pipeline candidate and draft its shot list — durations and the exact on-screen words
  • Keep the page; this candidate becomes your working example through Modules 2 and 3

If nothing on your list recurs, the honest conclusion is that you do not need this pipeline yet — that is a valid result.

Slide notes

The exercise is deliberately analogue and deliberately about demand rather than tooling. Most product teams have a backlog of video they vaguely intend to make — the feature walkthrough, the onboarding clip, the quarterly data story — and never staff. Surfacing that backlog is the point: it tells each participant whether this course's pipeline solves a problem they actually have, and which format should be their first composition.

The sorting step applies slides five and six directly. Recurring, structured, product-tracking formats are pipeline candidates. Craft-led or one-time pieces get marked for a specialist brief. Anything whose honest answer is 'just record the screen' gets marked exactly that, without embarrassment. Expect — and welcome — lists where only one of the three items is a genuine pipeline candidate.

The shot-list step is the bridge to the rest of the course. Writing durations and exact on-screen words for one candidate is most of a motion brief already; participants will reuse it when Module 2 sets up a real project, and again in Module 3 when narration enters. If running this live, compare lists across the room — the overlap is usually striking, and it makes the case for shared templates better than any slide can.

Narration for this slide

Time to make this yours, and you will not need any tools. Write down three recurring video needs your team postpones or quietly skips — the feature walkthrough that never gets made, the release video that happened once, the data story that lives in a static chart. For each one, note how often it would recur and what makes it go stale. Then sort them: pipeline candidate, specialist brief, or honest screen recording. Take the strongest pipeline candidate and draft its shot list — durations and the exact words on screen. Keep that page. It becomes your working example for the next two modules. And if nothing on your list recurs, that is a real finding too: you may not need this pipeline yet.

Slide 12 of 1216:9

Summary, and what comes next

  • Code-defined video turns motion into a text artifact: diffable, token-aware, regenerable from the terminal
  • Agents fit because the work is reading and writing code, not driving a GUI — read, generate, revise, re-render
  • The approach wins on recurring, structured, product-tracking formats; craft-led and one-off pieces still belong with specialists or a screen recording
  • Costs are front-loaded into the brief, tokens, and gates; revisions then drop to an edit and a render
  • The designer owns the script, the structure, the system, and the judgment — and the composition, not the MP4, is the deliverable

Module 2 gets practical: Remotion compositions and hyperframe-style sequences, the project structure an agent can navigate, and how to brief a scene.

Slide notes

Recap by connecting the bullets rather than repeating them: the artifact change (text instead of binary) is what lets the agent in; the agent being in is what changes the cost structure; the cost structure is what decides which formats are worth doing this way; and the designer's role shifts to the parts no gate can check. If participants remember one discipline from the module, it should be the dashed line on the diagram — edit the composition, never the video.

Preview Module 2 concretely. It introduces the two working approaches by name and in practice: Remotion compositions as React components the agent can write and revise, and hyperframe-style HTML sequences for lighter-weight motion — along with the project structure, design tokens, and briefing patterns that keep either maintainable. The shot list participants drafted in the exercise becomes the input for that module's setup work.

If time allows, take the temperature on the exercise results: how many people found at least one genuine pipeline candidate, and what it was. The answers are useful for pitching Module 2 at the right level, and they tend to confirm the module's central claim — the demand for recurring product video almost always exceeds the team's capacity to make it the old way.

Narration for this slide

Let's close the module. Code-defined video turns motion into a text artifact — diffable, built from your tokens, re-renderable from the terminal — and that is what lets an agent produce and revise it inside the loop you already run. The approach wins on recurring, structured formats that track a changing product; brand films, character work, and one-off demos still belong elsewhere. The cost is front-loaded into the brief, the tokens, and the gates, and then revisions become an edit and a render. Your job is the script, the structure, the system, and the judgment — and the composition, not the MP4, is the deliverable. In Module 2 we open the tools: Remotion and hyperframe-style sequences, and how to brief them. See you there.

Module transcript
Module 1, narrated slide by slide

Slide 1Video as an Agent Output

Welcome to Motion and Storytelling with Agents. This first module is about a single shift: what happens when video stops being something you assemble on a timeline and becomes something you define in code. That sounds like an engineering detail. It is not. It decides whether an agent can produce video for you at all, what a revision costs, and whether your motion work stays current as the product changes. We will look at where this approach genuinely beats traditional tools, where it clearly does not, and what your role becomes when the keyframes are no longer yours to drag. Let's start with the tools most teams reach for today.

Slide 2Timeline tools vs code-defined video: what each is for

Here is the comparison that frames everything else. Timeline tools produce a binary project file and an exported video. Both are fine objects — until the product changes, at which point someone has to find the file, reopen it, re-edit, and re-export. Code-defined video produces a different artifact: a text composition that lives in your repository, uses your design tokens, and re-renders from one terminal command. The MP4 becomes a build output, not the thing you maintain. And notice the row about who can edit it. A timeline needs hands on a GUI. A composition is text — which means a coding agent can write it, and a reviewer can diff it.

Slide 3Why agents and code-defined video fit

So why do agents and code-defined video fit so well together? Because the workflow is the one you already run. The agent reads — your tokens, your theme files, your existing compositions. It generates — a composition that is just text, either HTML with timing attributes or a React component driven by the frame number. You give feedback, and it revises — by editing the composition, never by touching a rendered file. Then it re-renders, and the same composition produces the same frames every time. Compare that with a timeline tool, where the work happens through clicks and drags an agent simply cannot perform. This is not text-to-video generation. It is the agent writing reviewable code against your design system — and that is exactly what it is already good at.

Slide 4Two paths to the same MP4

Here are the two paths side by side. The top lane is the familiar one: a binary project file, a hand-edited timeline, an exported video that starts going stale on the day it ships. The agent is blocked at every step — there is nothing it can read, nothing it can change, no check it can run. The bottom lane is the code-defined path. You write the brief and the script. The agent writes the composition — text, in your repository, using your tokens. It passes a review gate: diffs, lint, frame checks, a pacing pass. Then a deterministic render produces the MP4 as a build output. And the dashed line is the rule that makes the economics work: when something is wrong, you edit the composition. You never edit the video.

Slide 5The formats that benefit

So where does this approach genuinely win? Wherever video recurs and tracks a moving product. Feature clips and release-note videos — twenty to forty seconds, shipped every cycle, mostly on-screen text and product UI. Data stories, where the chart is drawn from a data file, so updating the video means updating the data and re-rendering. Social cuts, where one composition renders out at several aspect ratios instead of being re-laid-out by hand. And animated specs — short clips that show engineering exactly how a transition should feel, built from the same tokens the product uses. The common thread is reuse. A composition that renders once is a curiosity. One that re-renders every release is an asset.

Slide 6The formats that do not benefit

Now the honest half of the argument. Some video should not go through this pipeline. Brand films and campaign hero pieces are taste-led, watched once, and judged on craft no lint rule can measure. Character animation and heavy 3D will technically run in these frameworks, but you will fight the tools and the result will show it. Edit-driven storytelling — rhythm cut to music — is an editor's craft. And two cases are about honesty rather than craft: a one-off internal demo is better as a screen recording, and if the point is to show how the real product behaves, record the real product. The sorting question is simple. If the value is reuse and accuracy, use the pipeline. If the value is craft or a one-time moment, brief a specialist.

Slide 7Cost and iteration: what a revision actually costs

Let's talk about cost honestly, because the headline saving is not where people expect it. The first version is not the big win — writing a good brief, setting up tokens, and reviewing pacing takes real time, and a motion specialist may well produce a better first cut. The difference is every revision after that. In the timeline model, a copy change, a price update, or a brand refresh means reopening the project and re-exporting, every single time, by whoever owns the file. In the composition model, those same changes are a one-line edit and a render, and anyone on the team can ask the agent to make them. Flat cost per revision versus front-loaded setup and near-free revisions. That is the real comparison.

Slide 8The designer's role: script, structure, and judgment

So what is your job, if you are not the one dragging keyframes? Four things. The script — the exact words on screen and in narration, written before any composition exists. That is the single biggest quality lever in this whole workflow. The structure — shots, durations, and what each second is for. The system — the tokens, motion durations, and templates the agent has to draw from instead of inventing its own. And judgment — the pass no automated check can do: is the pacing readable, is the emphasis on the right claim, is the clip honest? Ask an agent to make it feel premium and you get placeholder enthusiasm. Give it the words, the shots, and the tokens, and you get something you can genuinely review.

Slide 9The artifact discipline: the composition is the deliverable

One more idea before the worked example, and it is a discipline rather than a tool. The composition is the deliverable. It lives in version control next to the product, it gets reviewed like any other artifact, and the MP4 it produces is a build output — replaceable, re-renderable, never hand-edited. This is the same pattern the wider field is converging on: HTML artifacts are already becoming standard design deliverables, and a video composition is that pattern with a timeline attached. The failure mode is tempting and quiet: someone tweaks an exported MP4 by hand because it feels faster, the source and the published video diverge, and the next re-render silently undoes the fix. Hold the line. Edit the composition.

Slide 10Worked example: one feature clip, two ways

Let's make this concrete with a real trace. The clip: a twenty-seven-second feature announcement — title card, two claim shots, closing card. The brief gave the agent four shots with durations, the exact on-screen words, and the token source, and told it to run the lint and inspect gates before calling it ready. The agent produced four shot files and a root timeline using the project's theme tokens. The gates earned their keep: lint caught a timing overlap, and a frame check caught a dead animation — a timeline registered but never advanced, so the shot was secretly a still image. Both fixed by editing the composition. Roughly forty-five to sixty minutes from brief to approval, as a labelled estimate — against the days to weeks a commissioned edit usually takes, mostly spent waiting.

Slide 11Exercise: the video work you keep postponing

Time to make this yours, and you will not need any tools. Write down three recurring video needs your team postpones or quietly skips — the feature walkthrough that never gets made, the release video that happened once, the data story that lives in a static chart. For each one, note how often it would recur and what makes it go stale. Then sort them: pipeline candidate, specialist brief, or honest screen recording. Take the strongest pipeline candidate and draft its shot list — durations and the exact words on screen. Keep that page. It becomes your working example for the next two modules. And if nothing on your list recurs, that is a real finding too: you may not need this pipeline yet.

Slide 12Summary, and what comes next

Let's close the module. Code-defined video turns motion into a text artifact — diffable, built from your tokens, re-renderable from the terminal — and that is what lets an agent produce and revise it inside the loop you already run. The approach wins on recurring, structured formats that track a changing product; brand films, character work, and one-off demos still belong elsewhere. The cost is front-loaded into the brief, the tokens, and the gates, and then revisions become an edit and a render. Your job is the script, the structure, the system, and the judgment — and the composition, not the MP4, is the deliverable. In Module 2 we open the tools: Remotion and hyperframe-style sequences, and how to brief them. See you there.