State-of-the-art video and image generation with Veo 2 and Imagen 3
Earlier this year, we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it’s been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI and creatives are using VideoFX and ImageFX to tell their stories. Together with collaborators ranging from filmmakers to businesses, we’re continuing to develop and evolve these technologies.
Today we're introducing a new video model, Veo 2, and the latest version of Imagen 3, both of which achieve state-of-the-art results. These models are now available in VideoFX, ImageFX and our newest Labs experiment, Whisk.
Veo 2: state-of-the-art video generation
Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. In head-to-head comparisons judged by human raters, Veo 2 achieved state-of-the-art results against leading models.
It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver — at resolutions up to 4K, and extended to minutes in length. Ask for a low-angle tracking shot that glides through the middle of a scene, or a close-up shot on the face of a scientist looking through her microscope, and Veo 2 creates it. Suggest “18mm lens” in your prompt and Veo 2 knows to craft the wide angle shot that this lens is known for, or blur out the background and focus on your subject by putting "shallow depth of field" in your prompt.
While video models often “hallucinate” unwanted details — extra fingers or unexpected objects, for example — Veo 2 produces these less frequently, making outputs more realistic.
Our commitment to safety and responsible development has guided Veo 2. We have been intentionally measured in growing Veo’s availability, so we can help identify, understand and improve the model’s quality and safety while slowly rolling it out via VideoFX, YouTube and Vertex AI.
Just like the rest of our image and video generation models, Veo 2 outputs include an invisible SynthID watermark that helps identify them as AI-generated, helping reduce the chances of misinformation and misattribution.
Today, we're bringing our new Veo 2 capabilities to our Google Labs video generation tool, VideoFX, and expanding the number of users who can access it. Visit Google Labs to sign up for the waitlist. We also plan to expand Veo 2 to YouTube Shorts and other products next year.
Imagen 3: state-of-the-art image generation
We've also improved our Imagen 3 image-generation model, which now generates brighter, better composed images. It can now render more diverse art styles with greater accuracy — from photorealism to impressionism, from abstract to anime. This upgrade also follows prompts more faithfully, and renders richer details and textures. In side-by-side comparisons of outputs by human raters against leading image generation models, Imagen 3 achieved state-of-the-art results.
Starting today, the latest Imagen 3 model will globally roll out in ImageFX, our image generation tool from Google Labs, to more than 100 countries. Visit ImageFX to get started.
Whisk: a fun new tool that lets you prompt with images to visualize your ideas
Whisk, our newest experiment from Google Labs, lets you input or create images that convey the subject, scene and style you have in mind. Then, you can bring them together and remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.
Under the hood, Whisk combines our latest Imagen 3 model with Gemini’s visual understanding and description capabilities. The Gemini model automatically writes a detailed caption of your images, and it then feeds those descriptions into Imagen 3. This process allows you to easily remix your subjects, scenes and styles in fun, new ways.
Whisk is launching in the U.S. today. Read more about Whisk and try it out at labs.google/Whisk.