Zurück zum Hauptmenü

Neue generative Modelle und Tools – erstellt mit und für Creator

Neue generative Modelle und Tools – erstellt mit und für Creator

[]

Neue generative Modelle und Tools – erstellt mit und für Creator

14 Mai 2024

|

Auf der Google I/O 2024 stellen wir Veo, unser leistungsstärkstes Modell für die Erstellung von Videos in HD sowie Imagen 3, unser Text-zu-Bild-Modell vor. Wir präsentieren zudem neue Demoaufnahmen, die mit der Music AI-Sandbox kreiert wurden.

Douglas Eck

Senior Research Director

Eli Collins

VP, Product Management

Das Bild zeigt eine Kostümbildnerin, die in ihrem Atelier arbeitet. Auf dem Bild ist der Text „Kreative Ideen zum Leben erwecken“ zu sehen.

In den vergangenen Jahren haben wir unglaubliche Fortschritte bei der Verbesserung der Qualität unserer generativen Medientechnologien gemacht. Dabei haben wir eng mit der Creator Community zusammengearbeitet, um herauszufinden, wie deren kreativer Prozess sich am besten mit künstlicher Intelligenz unterstützen lässt und wie unsere KI-Tools in jeder Phase sinnvoll unterstützen können.

Heute präsentieren wir Veo, unser fortschrittlichstes Modell für die Erstellung von Videos sowie Imagen 3, unser aktuell hochwertigstes Text-zu-Bild-Generierungsmodell.

Wir stellen auch Werke aus der jüngsten Zusammenarbeit mit dem Künstler Donald Glover und seinem Kreativstudio Gilga vor – sowie neue Demoaufnahmen, die von den Künstlern Wyclef Jean, Marc Rebillet und Songwriter Justin Tranter, mit Hilfe der Music AI-Sandbox entstanden sind.

Veo: unser leistungsstärkstes Modell für die Generierung von Videos

Veo generiert hochwertige Videos mit einer Auflösung von 1080p und bietet zahlreiche filmische und visuelle Stilelemente.

Dank Fortschritten beim Verständnis der menschlichen Sprache sowie visueller Semantik kann Veo Prompts in ein Video verwandeln, welches der kreativen Vision der Nutzerin oder des Nutzers sehr nahe kommt. Dabei versteht Veo nicht nur den Prompt, sondern kann auch Details aus längeren Prompts wiedergeben sowie den Ton der Eingabe treffen.

Das Modell versteht zudem filmische Begriffe wie Zeitraffer oder Landschaftsaufnahmen aus der Luft, was ganz neue Möglichkeiten für die kreative Steuerung eröffnet. So entsteht Bildmaterial, das einheitlich und kohärent ist: Menschen, Tiere und Objekte bewegen sich realistisch in den Aufnahmen.

Um herauszufinden, wie Veo den kreativen Prozess am besten unterstützen kann, haben wir eine Reihe von Filmschaffenden und Creators eingeladen, um mit dem Modell zu experimentieren. So konnten wir das Design unserer Technologien, ihre Entwicklung und Bereitstellung verbessern und sicherstellen, dass Creator im Entwicklungsprozess involviert und gehört werden.

Hier ist eine Vorschau auf unsere Arbeit mit Filmemacher Donald Glover und seinem Kreativstudio Gilga, die Veo für ein Filmprojekt ausprobiert haben.

Veo baut auf Jahren der Arbeit mit Modellen für die Generierung von Videos auf, wie zum Beispiel Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet und Lumiere — wobei Latenz und Ausgabeauflösung durch die Abstimmung von Architektur, Skalierungsregeln und anderen modernen Techniken verbessert wurden.

Mit Veo haben wir die Techniken verbessert, über die das Modell versteht, was im Video passiert, wie Bild und Ton in HD gerendert werden, wie die physischen Kräfte unserer Welt simuliert werden können und vieles mehr. Diese Erkenntnisse werden unserer KI-Forschung zugutekommen und werden uns ermöglichen, noch nützlichere Produkte zu entwickeln, die Menschen dabei unterstützen, neue Wege der Kommunikation und Interaktion zu gehen.

Ab heute ist Veo als private Vorschau über VideoFX verfügbar. In Zukunft werden wir einige der Fähigkeiten von Veo auch für YouTube Shorts und andere Produkte bereitstellen.

Erfahrt mehr über Veos Funktionen.

Imagen 3: unser hochwertiges Modell für die Generierung von Bildern

Im vergangenen Jahr haben wir unglaubliche Fortschritte bei der Verbesserung der Qualität und Detailtreue unserer Modelle und Tools für die Generierung von Bildern gemacht.

Imagen 3 ist bis dato unser hochwertigstes Text-zu-Bild-Modell. Es erstellt unglaublich detaillierte Bilder, die so lebensecht und realistisch wie Fotos wirken, mit wesentlich weniger ablenkenden visuellen Artefakten als bei vorherigen Modellen.

A close-up portrait of a gray wolf with intense yellow eyes. The wolf has a thick, gray and brown fur coat and a black nose. It is looking directly at the viewer with a calm but alert expression. The background is a blurred blue and gray sky.

Prompt: A close up of a sleek wolf perched regally in front of gray background, in a high-resolution photograph with detailed fine details, isolated on a plain stock photo with color grading in the style of a hyper-realistic style.

A large jellyfish with long, flowing tentacles drifts through the ocean. The jellyfish has a round, translucent bell with brown stripes and a cluster of frilly oral arms underneath. It is surrounded by blue water and a coral reef is visible in the background.

Prompt: Close-up of a jellyfish pulsating through crystal-clear water, tentacles trailing, vibrant coral reef background, macro photography, stock photo, high resolution, very detailed, soft lighting, professional color grading, shallow depth of field, sharp focus, taken with a DSLR camera in the style of professional photographers.

A wide river winds through a deep gorge carved into a lush, green mountain range under a clear blue sky. The river is calm and reflects the surrounding landscape. The sun shines brightly, casting shadows on the slopes and highlighting the textures of the rocks.

Prompt: View from above of beautiful river canyon with trees, showcasing its stunning natural beauty with green mountains and blue waters. The photo captures the vastness of nature's creation in the style of its creation.

Shot in the style of DSLR camera with the polarizing filter. A photo of two hot air balloons floating over the unique rock formations in Cappadocia, Turkey. The colors and patterns on these balloons contrast beautifully against the earthy tones of the landscape below. This shot captures the sense of adventure that comes with enjoying such an experience.

Prompt: Shot in the style of DSLR camera with the polarizing filter. A photo of two hot air balloons floating over the unique rock formations in Cappadocia, Turkey. The colors and patterns on these balloons contrast beautifully against the earthy tones of the landscape below. This shot captures the sense of adventure that comes with enjoying such an experience.

A pair of well-worn hiking boots, caked in mud and resting on a rocky trail. The head of a squirrel is poking out of one of the boots, and it looks lazily at the camera, a little king of its shoe. The laces of both boots fall loosely to the ground. There's a mountainous landscape in the background. Cinematic movie still, high quality DSLR photo.

Prompt: A pair of well-worn hiking boots, caked in mud and resting on a rocky trail. The head of a squirrel is poking out of one of the boots, and it looks lazily at the camera, a little king of its shoe. The laces of both boots fall loosely to the ground. There's a mountainous landscape in the background. Cinematic movie still, high quality DSLR photo.

Three women stand together laughing, with one woman slightly out of focus in the foreground. The sun is setting behind the women, creating a lens flare and a warm glow that highlights their hair and creates a bokeh effect in the background. The photography style is candid and captures a genuine moment of connection and happiness between friends. The warm light of golden hour lends a nostalgic and intimate feel to the image.

Prompt: Three women stand together laughing, with one woman slightly out of focus in the foreground. The sun is setting behind the women, creating a lens flare and a warm glow that highlights their hair and creates a bokeh effect in the background. The photography style is candid and captures a genuine moment of connection and happiness between friends. The warm light of golden hour lends a nostalgic and intimate feel to the image.

Imagen 3 versteht die menschliche Sprache besser und damit die Absicht hinter euren Prompts. Das fortschrittliche Verständnisfähigkeiten des Modells hilft ihm dabei, eine Reihe von Stilen zu beherrschen und Details aus längeren Prompts zu berücksichtigen.

A photo of a black man with short hair and beard smiling. In background there are blurry trees and buildings.

Prompt: A photo of a man with short hair and beard smiling at the camera. The background is blurry and it shows trees and buildings in light colors.

A view of a person's hand as they hold a little clay figurine of a bird in their hand and sculpt it with a modeling tool in their other hand. You can see the sculptor's scarf. Their hands are covered in clay dust. a macro DSLR image highlighting the texture and craftsmanship.

Prompt: A view of a person's hand as they hold a little clay figurine of a bird in their hand and sculpt it with a modeling tool in their other hand. You can see the sculptor's scarf. Their hands are covered in clay dust. a macro DSLR image highlighting the texture and craftsmanship.

An abstract sketch: A blur of expressive lines and energy captures the dynamic movement of a dancer in a gestural charcoal drawing. Sketch on aged parchment paper.

Prompt: Abstract sketch: A blur of expressive lines and energy captures the dynamic movement of a dancer in a gestural charcoal drawing. Sketch on aged parchment paper.

Elephant amigurumi walking in savanna, a professional photograph, blurry background.

Prompt: Elephant amigurumi walking in savanna, a professional photograph, blurry background.

A girl in white dress stands on the bank of an endless lake, holding flowers and looking at the sky full of pink clouds. The sky is reflected by the water surface, creating a beautiful anime scene. There are small hills covered with wildflowers around her, adding to its beauty. Anime style background, purple blue tone, soft light, warm colors, dreamy atmosphere, and romantic emotions.

Prompt: The girl in white dress stood on the bank of an endless lake, holding flowers and looking at the sky full of pink clouds. The sky is reflected by the water surface, creating a beautiful anime scene. There were small hills covered with wildflowers around her, adding to its beauty. Anime style background, purple blue tone, soft light, warm colors, dreamy atmosphere, and romantic emotions.

A weathered, wooden mech robot covered in flowering vines stands peacefully in a field of tall wildflowers, with a small bluebird resting on its outstretched hand. Digital cartoon, with warm colors and soft lines. A large cliff with waterfall looms behind.

Prompt: A weathered, wooden mech robot covered in flowering vines stands peacefully in a field of tall wildflowers, with a small bluebird resting on its outstretched hand. Digital cartoon, with warm colors and soft lines. A large cliff with waterfall looms behind.

Es ist auch unser bis dato bestes Modell für das Rendern von Text, was für derartige Modelle zuvor immer eine Herausforderung war. Dies eröffnet etwa neue Möglichkeiten für personalisierte Geburtstagsbotschaften oder Titel von Präsentationsfolien, um nur ein paar Beispiele zu nennen.

Der Eingang zu einem großen Steingebäude mit der eingravierten Aufschrift „Central Library“ über dem Eingang. Der Eingang ist von zwei Säulen eingerahmt und verfügt über einegroße Holztüre mit Glasscheiben.

Prompt: A photograph of a stately library entrance with the words "Central Library" carved into the stone

Eine detaillierte Origami-Eule aus braunem Papier sitzt mit geschlossenen Augen auf einem Tannenzweig. Ihre Federn sind aufwändig gefaltet und sie hat einen ruhigen Ausdruck. Der Hintergrund ist ein verschwommenes grünes Blattwerk.

Prompt: An origami owl made of brown paper is perched on a branch of an evergreen tree. The owl is facing forward with its eyes closed, giving it a peaceful appearance. The background is a blur of green foliage, creating a natural and serene setting.

Ein Filzroboter steht auf einer sonnenbeschienenen Waldlichtung, auf seiner Schulter sitzt eine Filzeule und zu seinen Füßen sitzt ein Filzfuchs. Der Roboter ist grau, hat große runde Augen und einen leicht besorgten Gesichtsausdruck. Die Eule hat große, orangefarbene Augen und braune Federn. Der Fuchs hat rotes Fell und einen buschigen Schwanz. Der Waldboden ist mit grünem Moos und abgefallenen Blättern bedeckt.

Prompt: Photo of a felt puppet diorama scene of a tranquil nature scene of a secluded forest clearing with a large friendly, rounded robot is rendered in a risograph style. An owl sits on the robots shoulders and a fox at its feet. Soft washes of color, 5 color, and a light-filled palette create a sense of peace and serenity, inviting contemplation and the appreciation of natural beauty

Eine Pixelkunstillustration des Space Shuttle STS-1, das in einen blauen Himmel startet und eine Spur aus Rauch und Flammen hinterlässt. Der Text „STS-1“ befindet sich unten im Bild.

Prompt: Pixel art of a space shuttle blasting of. Cape Canaveral in the background, blue skies, with plumes of smoke billowing out. "STS-1" is written below it.

Prompt: Word “light” made from various colorful feathers, black background

Das Wort „Light“ bestehend aus bunten Federn, die auf schwarzem Hintergrund angeordnet sind.

Eine komplett aus Ton gefertigte Szene, die eine ältere Frau zeigt, die ein wallendes rotes Oberteil und einen taupefarbenen Rock trägt. Sie geht auf einem geraden Weg durch einen Garten, auf dessen beiden Seiten üppige Pflanzen wachsen. In ihrer rechten Hand hält sie eine große orangefarbene Gießkanne und gießt damit die Pflanzen.

Prompt: Claymation scene. A medium wide shot of an elderly woman. She is wearing flowing clothing. She is standing in a lush garden watering the plants with an orange watering can

Ab heute ist Imagen 3 für ausgewählte Creator als private Vorschau innerhalb ImageFX verfügbar und auch ihr könnt euch auf die Warteliste setzen lassen, denn es ist bald bei Vertex AI.

Erfahrt mehr über die Fähigkeiten von Imagen 3.

KI-Tools für Musik: unsere Zusammenarbeit mit der Musik-Community

Als Teil unserer fortlaufenden Bemühungen, die Bedeutung von künstlicher Intelligenz für Kunst und Musik besser zu verstehen, arbeiten wir mit einigen ausgezeichneten Musiker*innen, Songwritern und Produzent*innen zusammen. Dies erfolgt im Rahmen einer Partnerschaft mit YouTube.

Auch die Entwicklung unserer Technologien für die Generierung von Musik wurde durch diese Zusammenarbeit beeinflusst. Ein Beispiel ist Lyria, unsere fortschrittlichste Modellfamilie für die Generierung von Musik mithilfe von künstlicher Intelligenz.

Als Teil dieser Arbeit haben wir eine Reihe von KI-Tools für Musik entwickelt – die Music AI-Sandbox. Diese Tools sollen Creators eine ganz neue Spielwiese für Kreativität bieten, die ihnen erlaubt, von Grund auf neue instrumentale Einlagen zu kreieren, Töne in neue Richtungen zu transformieren und vieles mehr.

Heute setzen wir unsere Musikexperimente fort: mit dem Grammy-Gewinner Wyclef Jean, dem Elektromusiker Marc Rebillet und dem Grammy-nominierten Songwriter Justin Tranter. Alle waren Teilnehmer am Inkubator und haben auf ihren YouTube-Kanälen neue Demoaufnahmen veröffentlicht, die mithilfe unserer KI-Tools für Musik entstanden sind.

Verantwortung bei Design, Entwicklung und Bereitstellung

Wir sind uns dessen bewusst, dass wir nicht nur an Fortschritt, sondern auch an die damit verbundene Verantwortung denken müssen. Deshalb ergreifen wir Maßnahmen, um die Herausforderungen zu meistern, die mit der künstlichen Intelligenz einhergehen. Menschen und Unternehmen sollten in der Lage sein, verantwortungsbewusst mit KI-generierten Inhalten umzugehen.

Bei der Entwicklung all dieser Technologien haben wir mit der Kreativ-Community und weiteren Interessenten zusammengearbeitet. Wir haben Erkenntnisse und Feedback gesammelt und damit die Sicherheit unserer Technologien verbessert und ihren verantwortungsbewussten Einsatz gestärkt.

Wir haben Sicherheitstests durchgeführt, Filter eingesetzt, Regeln aufgestellt und unsere Sicherheitsteams in den Mittelpunkt unserer Entwicklungsarbeit gestellt. So entstanden wegweisende Tools wie SynthID, das unsichtbare digitale Wasserzeichen auf Bildern, in Audioaufnahmen sowie in Text und Video einbettet, die mithilfe von künstlicher Intelligenz erstellt wurden. Ab heute werden alle Videos, die von Veo in VideoFX generiert wurden, durch SynthID mit einem Wasserzeichen versehen.

Das kreative Potenzial von generativer KI ist immens. Wir freuen uns darauf zu erleben, wie Menschen auf der ganzen Welt mit unseren neuen Modellen und Tools ihre Ideen zum Leben erwecken.

LABEL:

Künstliche Intelligenz

Ask Advisor: euer neuer KI-gestützter Partner

Von Vidhya Srinivasan

Google Marketing Live 2026

Von Vidhya Srinivasan

Künstliche Intelligenz

I/O 2026: Willkommen in der agentischen Ära von Gemini

Von Sundar Pichai

Gemini for Science: Tools für eine neue Ära der Forschung

Von Pushmeet Kohli & Yossi Matias

Eine neue Ära der KI-Suche

Von Elizabeth Reid

Generative KI in der Suche - Fünf neue Wege

Von Hema Budaraju