New generative media models and tools, built with and for creators

May 14, 2024

[[read-time]] min read

We’re introducing Veo, our most capable model for generating high-definition video, and Imagen 3, our highest quality text-to-image model. We’re also sharing new demo recordings created with our Music AI Sandbox.

Eli Collins

VP, Product Management

Douglas Eck

Senior Research Director

Image showing a costume maker working in their studio, with the words “Bring creative ideas to life” written over the image.

Over the past year, we’ve made incredible progress in enhancing the quality of our generative media technologies. We’ve been working closely with the creative community to explore how generative AI can best support the creative process, and to make sure our AI tools are as useful as possible at each stage.

Today, we’re introducing Veo, our latest and most advanced video generation model, and Imagen 3, our highest quality text-to-image model yet.

We’re also sharing some of our recent collaborations with filmmaker Donald Glover and his creative studio, Gilga, and new demo recordings being released by artists Wyclef Jean, Marc Rebillet and songwriter Justin Tranter, made with help from our Music AI Sandbox.

Veo: our most capable video generation model

Veo generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute. With an advanced understanding of natural language and visual semantics, it generates video that closely represents a user’s creative vision — accurately capturing a prompt’s tone and rendering details in longer prompts.

The model provides an unprecedented level of creative control, and understands cinematic terms like “timelapse” or “aerial shots of a landscape”. Veo creates footage that’s consistent and coherent, so people, animals and objects move realistically throughout shots.

Examples of Veo’s high-quality video generation capabilities. All videos were generated by Veo and have not been modified.

To discover how Veo can best support the storyteller’s creative process, we’re inviting a range of filmmakers and creators to experiment with the model. These collaborations also help us improve the way we design, build and deploy our technologies to make sure creators have a voice in how they’re developed.

Here's a preview of our work with filmmaker Donald Glover and his creative studio, Gilga, who experimented with Veo for a film project.

10:25

Veo builds upon years of our generative video model work, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere — combining architecture, scaling laws and other novel techniques to improve quality and output resolution.

With Veo, we’ve improved techniques for how the model learns to understand what's in a video, renders high-definition images, simulates the physics of our world and more. These learnings will fuel advances across our AI research and enable us to build even more useful products that help people interact and communicate in new ways.

Starting today, Veo is available to select creators in private preview in VideoFX by joining our waitlist. In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products.

Learn more about Veo’s capabilities.

Imagen 3: our highest quality text-to-image model

Over the last year, we’ve made incredible progress improving the quality and fidelity of our image generation models and tools.

Imagen 3 is our highest quality text-to-image model. It generates an incredible level of detail, producing photorealistic, lifelike images, with far fewer distracting visual artifacts than our prior models.

Prompt: A close up of a sleek wolf perched regally in front of gray background, in a high-resolution photograph with detailed fine details, isolated on a plain stock photo with color grading in the style of a hyper-realistic style.
Prompt: Close-up of a jellyfish pulsating through crystal-clear water, tentacles trailing, vibrant coral reef background, macro photography, stock photo, high resolution, very detailed, soft lighting, professional color grading, shallow depth of field, sharp focus, taken with a DSLR camera in the style of professional photographers.
Prompt: View from above of beautiful river canyon with trees, showcasing its stunning natural beauty with green mountains and blue waters. The photo captures the vastness of nature's creation in the style of its creation.
Prompt: Shot in the style of DSLR camera with the polarizing filter. A photo of two hot air balloons floating over the unique rock formations in Cappadocia, Turkey. The colors and patterns on these balloons contrast beautifully against the earthy tones of the landscape below. This shot captures the sense of adventure that comes with enjoying such an experience.
Prompt: A pair of well-worn hiking boots, caked in mud and resting on a rocky trail. The head of a squirrel is poking out of one of the boots, and it looks lazily at the camera, a little king of its shoe. The laces of both boots fall loosely to the ground. There's a mountainous landscape in the background. Cinematic movie still, high quality DSLR photo.
Prompt: Three women stand together laughing, with one woman slightly out of focus in the foreground. The sun is setting behind the women, creating a lens flare and a warm glow that highlights their hair and creates a bokeh effect in the background. The photography style is candid and captures a genuine moment of connection and happiness between friends. The warm light of golden hour lends a nostalgic and intimate feel to the image.

Imagen 3 better understands natural language, the intent behind your prompt and incorporates small details from longer prompts. The model’s advanced understanding helps it master a range of styles.

Prompt: A photo of a man with short hair and beard smiling at the camera. The background is blurry and it shows trees and buildings in light colors.
Prompt: A view of a person's hand as they hold a little clay figurine of a bird in their hand and sculpt it with a modeling tool in their other hand. You can see the sculptor's scarf. Their hands are covered in clay dust. a macro DSLR image highlighting the texture and craftsmanship.
Prompt: Abstract sketch: A blur of expressive lines and energy captures the dynamic movement of a dancer in a gestural charcoal drawing. Sketch on aged parchment paper.
Prompt: Elephant amigurumi walking in savanna, a professional photograph, blurry background.
Prompt: The girl in white dress stood on the bank of an endless lake, holding flowers and looking at the sky full of pink clouds. The sky is reflected by the water surface, creating a beautiful anime scene. There were small hills covered with wildflowers around her, adding to its beauty. Anime style background, purple blue tone, soft light, warm colors, dreamy atmosphere, and romantic emotions.
Prompt: A weathered, wooden mech robot covered in flowering vines stands peacefully in a field of tall wildflowers, with a small bluebird resting on its outstretched hand. Digital cartoon, with warm colors and soft lines. A large cliff with waterfall looms behind.

It’s also our best model yet for rendering text, which has been a challenge for image generation models. This capability opens up possibilities for generating personalized birthday messages, title slides in presentations and more.

Prompt: A photograph of a stately library entrance with the words "Central Library" carved into the stone.
Prompt: An origami owl made of brown paper is perched on a branch of an evergreen tree. The owl is facing forward with its eyes closed, giving it a peaceful appearance. The background is a blur of green foliage, creating a natural and serene setting.
Prompt: Photo of a felt puppet diorama scene of a tranquil nature scene of a secluded forest clearing with a large friendly, rounded robot is rendered in a risograph style. An owl sits on the robots shoulders and a fox at its feet. Soft washes of color, 5 color, and a light-filled palette create a sense of peace and serenity, inviting contemplation and the appreciation of natural beauty.
Prompt: Pixel art of a space shuttle blasting of. Cape Canaveral in the background, blue skies, with plumes of smoke billowing out. "STS-1" is written below it.
Prompt: Word “light” made from various colorful feathers, black background.
Prompt: Claymation scene. A medium wide shot of an elderly woman. She is wearing flowing clothing. She is standing in a lush garden watering the plants with an orange watering can.

Starting today, Imagen 3 is available to select creators in private preview in ImageFX, and by joining our waitlist. Imagen 3 will be coming soon to Vertex AI.

Learn more about Imagen 3’s capabilities.

Our collaborations with the music community

As part of our continued exploration into the role AI can play in art and music creation, we’re collaborating in partnership with YouTube, with some amazing musicians, songwriters and producers.

These collaborations are also informing the development of our generative music technologies, including Lyria, our most advanced model for AI music generation.

As part of this work, we’ve been developing a suite of music AI tools called Music AI Sandbox. These tools are designed to open a new playground for creativity, allowing people to create new instrumental sections from scratch, transform sound in new ways and much more.

A short film introducing our collaborations, in partnership with YouTube, with musicians, songwriters and producers developing a suite of music AI tools called Music AI Sandbox.

10:25

We're partnering with musicians, songwriters, and producers to investigate the exciting role artificial intelligence can have in the music creation process.

Today, we’re continuing that experimentation in music with Grammy-winning musician Wyclef Jean, Grammy-nominated songwriter Justin Tranter and electronic musician Marc Rebillet — who are releasing new demo recordings on their YouTube channels, created with help from our music AI tools.

Video playlist of demo recordings from Grammy winning musician Wyclef Jean, Grammy nominated songwriter Justin Tranter and electronic musician Marc Rebillet.

10:25

Wyclef Jean, Justin Tranter, and Marc Rebillet are the first to release new demos using the Music AI Sandbox, and each demo is now available for listening on their YouTube channels.

Responsible from design to deployment

We’re mindful about not only advancing the state of the art, but doing so responsibly. So we’re taking measures to address the challenges raised by generative technologies and helping enable people and organizations to responsibly work with AI-generated content.

For each of these technologies, we’ve been working with the creative community and other external stakeholders, gathering insights and listening to feedback to help us improve and deploy our technologies in safe and responsible ways.

We’ve been conducting safety tests, applying filters, setting guardrails, and putting our safety teams at the center of development. Our teams are also pioneering tools, such as SynthID, which can embed imperceptible digital watermarks into AI-generated images, audio, text and video. And starting today, all videos generated by Veo on VideoFX will be watermarked by SynthID.

The creative potential for generative AI is immense and we can’t wait to see how people around the world will bring their ideas to life with our new models and tools.