Skip to main content

Whisk: Visualize and remix ideas using images and AI

["How is Gemini changing Maps?", "What is \"vibe design?\"", "How can I learn new AI skills?"]

Whisk: Visualize and remix ideas using images and AI

Dec 16, 2024

Whisk is a new Google Labs experiment that lets you prompt using images for a fast and fun creative process.

image (1)

Thomas Iljic

Director of Product Management, Google Labs

Nicole Brichtova

Product Manager, Google DeepMind

General summary

Whisk is a new generative AI tool that lets you create images by inputting images, not text. You can drag in images for the subject, scene, and style, and then remix them to create something unique. Whisk uses Gemini to automatically write a detailed caption of your images, which is then fed into Imagen 3 to generate the final image. This process captures the essence of your subject, not an exact replica, allowing you to easily remix your subjects, scenes, and styles in novel ways.

Summaries were generated by Google AI. Generative AI is experimental.

Bullet points

Whisk lets you create images using images, not just text prompts.
Drag and drop images for the subject, scene, and style to remix them.
Whisk uses Gemini to write detailed captions and Imagen 3 to generate images.
It captures the essence of your images, not exact replicas, for creative remixing.
Try out Whisk in the US at labs.google/whisk and share your feedback.

Summaries were generated by Google AI. Generative AI is experimental.

Basic explainer

Whisk is a new tool that lets you make pictures using AI. You can drag and drop images, and Whisk will use those images to create something new. It's like a remixer for pictures! You can change the subject, the scene, and the style of the image. It's not perfect, but it's a fun way to explore ideas and make new things. You can try it out if you live in the US.

Summaries were generated by Google AI. Generative AI is experimental.

Explore other styles:

On a vibrant yellow background, a collage of diverse AI-generated images—including a portrait, a landscape, and an anime character—surrounds the text "Prompt less, Play more" in bold black letters. This promotes "Whisk," an AI experiment from Google, invi

Today in the US, we’re launching our newest experiment in generative AI: Whisk. Instead of generating images with long, detailed text prompts, Whisk lets you prompt with images. Simply drag in images, and start creating.

Whisk lets you input images for the subject, one for the scene and another image for the style. Then, you can remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.

This image, set against a bright yellow background, showcases an example of how Whisk works. It features a large, detailed illustration on the right of a fantastical fish with a city built on its back, accompanied by smaller images on the left which are the inputs used to generate it. We see a submarine, a floating island, and a scenic landscape which are the subject, scene, and style image options used to generate the end result.

Whisk - fantastical fish - generated image example

Set against a bright yellow background, this image showcases a second example of how Whisk works. It displays a whimsical illustration of a walrus wearing a strawberry-patterned swimsuit and a flower crown. On the lefthand side, input images of a walrus for the subject, a flowerfield for the scene and a pattern of cartoon-like clouds appear for style.

Whisk - whimsical walrus - generated image example

Set against a bright yellow background, this image showcases how Whisk can generate an image onto an enamel pin. It features a colorful glazed doughnut with sprinkles. On the left hand side, a realistic photo of a glazed doughnut next to a metallic cutout of a man serving as the subject and style for the final enamel pin generation.

Whisk - glazed doughnut with sprinkles - generated enamel pin example

Set against a bright yellow background, this image showcases a fantastical cat with horns. It has a sparkling, purple-hued coat and striking green eyes. This creature is resting on a large lily pad in a body of water with additional lily pads in the background. Three image thumbnails are the image inputs used; a sparkly cat with horns appears as the subject input image, a nature scene with lilies and a landscape with a tree and clouds for style.

Whisk - fantastical cat with horns - generated image example

Behind the scenes, the Gemini model automatically writes a detailed caption of your images. It then feeds those descriptions into Google’s latest image generation model, Imagen 3. This process captures your subject's essence, not an exact replica. That way, you can easily remix your subjects, scenes and styles in novel ways.

Since Whisk extracts only a few key characteristics from your image, it might generate images that differ from your expectations. For example, the generated subject might have a different height, weight, hairstyle or skin tone. We understand these features may be crucial for your project and Whisk may miss the mark, so we let you view and edit the underlying prompts at any time.

In our early testing with artists and creatives, people have been describing Whisk as a new type of creative tool — not a traditional image editor. We built it for rapid visual exploration, not pixel-perfect edits. It’s about exploring ideas in new and creative ways, allowing you to work through dozens of options and download the ones you love.

If you are based in the US, you can try it out today at labs.google/whisk and tell us what you think.

Google Labs is where we cook up experiments with the latest generative AI models like Gemini, Imagen and Veo. Our goal is to get feedback on new products and features as we work to shape technology together. You can stay up to date on Whisk and other experiments by signing up for our newsletter and following Google Labs on X, Reddit and Discord.

POSTED IN: