New generative media models and tools, built with and for creators
Over the past year, we’ve made incredible progress in enhancing the quality of our generative media technologies. We’ve been working closely with the creative community to explore how generative AI can best support the creative process, and to make sure our AI tools are as useful as possible at each stage.
Today, we’re introducing Veo, our latest and most advanced video generation model, and Imagen 3, our highest quality text-to-image model yet.
We’re also sharing some of our recent collaborations with filmmaker Donald Glover and his creative studio, Gilga, and new demo recordings being released by artists Wyclef Jean, Marc Rebillet and songwriter Justin Tranter, made with help from our Music AI Sandbox.
Veo: our most capable video generation model
Veo generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute. With an advanced understanding of natural language and visual semantics, it generates video that closely represents a user’s creative vision — accurately capturing a prompt’s tone and rendering details in longer prompts.
The model provides an unprecedented level of creative control, and understands cinematic terms like “timelapse” or “aerial shots of a landscape”. Veo creates footage that’s consistent and coherent, so people, animals and objects move realistically throughout shots.
Examples of Veo’s high-quality video generation capabilities. All videos were generated by Veo and have not been modified.
To discover how Veo can best support the storyteller’s creative process, we’re inviting a range of filmmakers and creators to experiment with the model. These collaborations also help us improve the way we design, build and deploy our technologies to make sure creators have a voice in how they’re developed.
Here's a preview of our work with filmmaker Donald Glover and his creative studio, Gilga, who experimented with Veo for a film project.
Veo builds upon years of our generative video model work, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere — combining architecture, scaling laws and other novel techniques to improve quality and output resolution.
With Veo, we’ve improved techniques for how the model learns to understand what's in a video, renders high-definition images, simulates the physics of our world and more. These learnings will fuel advances across our AI research and enable us to build even more useful products that help people interact and communicate in new ways.
Starting today, Veo is available to select creators in private preview in VideoFX by joining our waitlist. In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products.
Learn more about Veo’s capabilities.
Imagen 3: our highest quality text-to-image model
Over the last year, we’ve made incredible progress improving the quality and fidelity of our image generation models and tools.
Imagen 3 is our highest quality text-to-image model. It generates an incredible level of detail, producing photorealistic, lifelike images, with far fewer distracting visual artifacts than our prior models.
Imagen 3 better understands natural language, the intent behind your prompt and incorporates small details from longer prompts. The model’s advanced understanding helps it master a range of styles.
It’s also our best model yet for rendering text, which has been a challenge for image generation models. This capability opens up possibilities for generating personalized birthday messages, title slides in presentations and more.
Starting today, Imagen 3 is available to select creators in private preview in ImageFX, and by joining our waitlist. Imagen 3 will be coming soon to Vertex AI.
Learn more about Imagen 3’s capabilities.
Our collaborations with the music community
As part of our continued exploration into the role AI can play in art and music creation, we’re collaborating in partnership with YouTube, with some amazing musicians, songwriters and producers.
These collaborations are also informing the development of our generative music technologies, including Lyria, our most advanced model for AI music generation.
As part of this work, we’ve been developing a suite of music AI tools called Music AI Sandbox. These tools are designed to open a new playground for creativity, allowing people to create new instrumental sections from scratch, transform sound in new ways and much more.
Today, we’re continuing that experimentation in music with Grammy-winning musician Wyclef Jean, Grammy-nominated songwriter Justin Tranter and electronic musician Marc Rebillet — who are releasing new demo recordings on their YouTube channels, created with help from our music AI tools.
Responsible from design to deployment
We’re mindful about not only advancing the state of the art, but doing so responsibly. So we’re taking measures to address the challenges raised by generative technologies and helping enable people and organizations to responsibly work with AI-generated content.
For each of these technologies, we’ve been working with the creative community and other external stakeholders, gathering insights and listening to feedback to help us improve and deploy our technologies in safe and responsible ways.
We’ve been conducting safety tests, applying filters, setting guardrails, and putting our safety teams at the center of development. Our teams are also pioneering tools, such as SynthID, which can embed imperceptible digital watermarks into AI-generated images, audio, text and video. And starting today, all videos generated by Veo on VideoFX will be watermarked by SynthID.
The creative potential for generative AI is immense and we can’t wait to see how people around the world will bring their ideas to life with our new models and tools.