Skip to main content

Behind “ANCESTRA”: combining Veo with live-action filmmaking

["What's new about the Pixel 10a?", "What can Gemini 3.1 do?", "How can I learn new AI skills?"]

Behind “ANCESTRA”: combining Veo with live-action filmmaking

Jun 13, 2025

We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking.

Kory-2025-warm

Kory Mathewson

Senior Research Scientist, Google DeepMind

Today, Eliza McNitt’s short film, “ANCESTRA,” premieres at the Tribeca Festival. It’s the story of a mother, and what happens when her child is born with a hole in its heart. Inspired by the dramatic events of McNitt's own birth, the film portrays a mother's love as a cosmic, life-saving force.

This is the first of three short films produced in partnership between our team at Google DeepMind and Primordial Soup, a new venture dedicated to storytelling innovation founded by director Darren Aronofsky. Together, we founded this partnership to put the world’s best generative AI into the hands of top filmmakers, to advance the frontiers of storytelling and technology.

“ANCESTRA” combined live-action scenes with sequences generated by Veo, our state-of-the-art video generation model. McNitt described her experience working with our technology: "Veo is another lens through which I get to imagine the universe around me.”

To create “ANCESTRA”, Google DeepMind assembled a multidisciplinary creative team of animators, art directors, designers, writers, technologists and researchers who worked closely with more than 200 experts in traditional filmmaking and production, a live-action crew and cast, plus an editorial team, visual effects (VFX) artists, sound designers and music composers.

Bringing our most advanced generative models to the screen

While McNitt wrote the script for “ANCESTRA,” she worked with a storyboard artist to visualize the live-action scenes and collaborated with our team to generate imagery for sequences that could benefit from AI generation.

We used Gemini to develop our prompts, and used Veo and our image generation model, Imagen, to create a series of potential shots, organized by mood, color and emotion. Here’s a breakdown of how we planned and created the AI elements of the film:

Gemini: Our team uploaded photos taken by McNitt’s father of the day she was born, and asked Gemini to describe these photos in precise aesthetic detail. These descriptions became the prompts for creating new images and videos.
Imagen: We generated the film's key concept art, defining the overall look, style and mood. These images became the starting point for our videos.
Veo: We animated the generated images and wrote additional text prompts for guiding the action and movement to create the final shots.

Developing new Veo capabilities together

While Veo made it possible to generate scenes that combined live-action acting and generative footage of a realistic newborn baby, it also posed new challenges. For example, McNitt wanted the generated video to match the quality and color of her live-action scenes. She also needed to control the camera motion and subject matter of the generated video. To meet these challenges, we developed several new Veo capabilities to enable greater personalization, precise motion matching, and the ability to blend live-action and generative footage.

Personalized video generation

We aimed to generate videos that felt as intimate and personal as the story itself. For example, McNitt wanted to generate footage of a realistic-looking baby in utero, while controlling the art direction, composition and motion. So we fine-tuned an Imagen model to match the style of reference images. Then, we worked with Gemini to craft and refine prompts to generate realistic images of a baby in the womb. Finally, we turned those images into animated scenes using Veo’s image-to-video capability.

By fine-tuning an Imagen model, we maintained specific and consistent art direction between different scenes of the AI-generated baby.

A grid of four distinct generated images of a baby drifting in a dimly-lit, murky environment — her face with closed eyes, detail of her foot, back of the head, and chest.

Motion matched video generation

In one scene, McNitt wanted to take the viewer on a journey through the human body, eventually landing in the womb to show a baby being born via C-section. To follow this precise camera motion, we created a virtual, 3D model of a human body and recorded a draft shot of the scene by moving a virtual camera through this model. Then we used Veo to track the draft shot’s motion and generate new videos using that same movement. We guided the generated video with text prompts, until we achieved the shot McNitt had in mind.

McNitt mapped out her desired camera motion using a virtual model of the human body. Then we used Veo’s motion matching to generate a video with that same movement.

In another scene, McNitt wanted to show an array of organic holes closing up, alluding to the hole in the baby’s heart. So, we gave Veo reference videos of this motion and prompted it to motion match across different shots. Producing these sequences with just computer generated imagery (CGI) would have been complex and time-intensive, and it would have been difficult to control motion using text prompts alone. With Veo’s help, we could produce high-quality scenes in just a few minutes.

We gave Veo an input video with the desired motion. Then, Veo combines the reference motion with a text prompt to generate a new motion-matched scene.

Blending traditional filmmaking and generative video

Imagery of babies produced using traditional VFX runs the risk of looking uncanny, and it's challenging and time-consuming for directors to get the exact performance they have in mind. So, for the birth, we composed the actor’s performance and generated a realistic looking newborn to fit the scene. First, we gave Veo the live-action footage, a text prompt describing the scene, and a defined area for adding the baby. Then, using Veo’s “add object” capability, we generated the AI image of a baby into the live-action footage — keeping everything else consistent — and we refined the shot with traditional VFX and color grading.

We added a generated newborn baby to live-action footage and refined the final shot with VFX and color grading.

Adding generative video to traditional workflows

Many scenes in the film use multiple AI-generated images and videos that are seamlessly composed using traditional filmmaking workflows. For example, we created a scene showing complex textures on the inside of a recently hatched crocodile egg at sunset. To construct this shot, we combined multiple generated videos and images with traditional VFX compositing techniques.

This shot captures the point-of-view from inside a cracking crocodile egg, at sunset with the protective mother crocodile nearby. We used Veo and Imagen to generate the key visual elements, which were then seamlessly composited in a traditional VFX pipeline to bring this specific creative vision to life.

Partnering with the film industry to tell new stories

“ANCESTRA” is the first of three films we're making with Primordial Soup. Each film in this partnership is directed by an emerging filmmaker who is mentored by Darren Aronofsky and supported by our team.

Many amazing movies have been created with live-action filmmaking, CGI and VFX toolkits. Generative AI can complement existing creative and production workflows, empowering filmmakers to overcome practical limitations with difficult-to-capture or prohibitively expensive scenes.

By working with artists, we ensure that the tools we’re building are useful and rooted in the needs of professional filmmakers. Collaborating with visionaries like McNitt and Aronofsky helps us explore the creative potential of today's technologies and imagine what we could create next.

POSTED IN: