How AI makes virtual try-on more realistic

Jun 14, 2023

Our new generative AI model uses a technique called diffusion to show you what clothes look like on a wide range of people and poses.

Ira Kemelmacher-Shlizerman

Principal Scientist, Shopping

Several pieces of colorful clothing hang on a rack. An icon overlaid says “try-on models.”

Today we announced virtual try-on for apparel, a new feature that shows you what clothes look like on real models with different body shapes and sizes. That includes those subtle but crucial details, like how something drapes, folds, clings, stretches and wrinkles. To do this, our shopping AI researchers developed a new generative AI model that produces life-like portrayals of clothing on people.

Let's take a closer look at this new AI model and how exactly it powers our virtual try-on (VTO) feature.

Generating try-on images with AI

Perhaps the most popular reference for VTO dates back to the movie “Clueless.” We’ve come a long way since Cher’s closet, though. Current techniques like geometric warping can cut-and-paste and then deform a clothing image to fit a silhouette. Even so, the final images never quite hit the mark: Clothes don’t realistically adapt to the body, and they have visual defects like misplaced folds that make garments look misshapen and unnatural.

So when we set out to build a new VTO feature, we were committed to generating every pixel of a garment from scratch to produce high-quality, realistic images. We found a way with our new diffusion-based AI model.

Our VTO technology produces high quality, realistic images.

40 real models appear in a grid. They’re each wearing the same pink blouse.

To understand how this model works, let’s first explain diffusion. Diffusion is the process of gradually adding extra pixels (or “noise”) to an image until it becomes unrecognizable — and then removing the noise completely until the original image is reconstructed in perfect quality. Text-to-image models like Imagen use diffusion plus text from a large language model (LLM), to generate a realistic image solely based on the text you enter.

Inspired by Imagen, we decided to tackle VTO using diffusion — but with a twist. Instead of using text as input during diffusion, we use a pair of images: one of a garment and another of a person. Each image is sent to its own neural network (a U-net) and shares information with each other in a process called “cross-attention” to generate the output: a photorealistic image of the person wearing the garment. This combination of image-based diffusion and cross-attention make up our new AI model.

Virtual try-on for apparel lets you render tops on real models who resonate with you.

An animation shows various models, all wearing the same pink blouse.

Training with Google’s Shopping Graph

To make our VTO feature as helpful and realistic as possible, we put the new AI model through rigorous training. But rather than training it with an LLM (like Imagen does), we tapped into Google’s Shopping Graph, the world’s most comprehensive data set of the latest products, sellers, brands, reviews and inventory.

We trained the model using many pairs of images, each including a person wearing a garment in two different poses — let’s say, an image of someone wearing a shirt standing sideways and another of them standing forward. In this scenario, the AI model learns to match the shape of the shirt in the sideways pose with the person in the forward pose, and vice versa, until it can generate realistic images of the shirt on that person from all angles. To take it up a notch, we repeated this process using millions of random image pairs of different garments and people. The result allows you to see what a top looks like on the model of your choice.

Our diffusion model sends images to their own neural network (a U-net) to generate the output: a photorealistic image of the person wearing the garment.

Two images of models are shown: one in a pink blouse, the other in a black tank top. An illustration portrays the process of diffusion, where each image is sent to their own neural network. The result is an image of the woman in the tank top wearing the pink blouse.

Starting today, you can use virtual try-on for apparel on women’s tops from brands across Google’s Shopping Graph, including Anthropologie, LOFT, H&M and Everlane. Over time, the tool will get even more precise and expand to more of your favorite brands.

To learn more about the tech behind this feature, read our latest research paper.

POSTED IN:

How AI makes virtual try-on more realistic

Generating try-on images with AI

Training with Google’s Shopping Graph

Related stories