Ask a Techspert: What’s a world model?
We recently introduced Project Genie, an experimental research prototype that lets you create, explore and remix your own interactive worlds. Project Genie is powered by what’s called a “world model.” It’s currently available to Google AI Ultra subscribers in the U.S. over 18 with plans to expand further.
Now, you’ve probably heard of large language models, machine learning models, image generation models and so on…but “world model” might be a new one. To help explain the concept, we sat down with Googlers Shlomi Fruchter and Jack Parker-Holder.
Congratulations on the launch of Project Genie! What were your roles on the team?
Shlomi: Jack and I co-lead Genie development. I mostly focus on our next-generation video and world models and working with the team to research new improvements.
Jack: I'm a research scientist as well as co-leading Genie. My job is mostly about dreaming up new capabilities for our models and then making sure there's a team, roadmap and plan to make it happen.
What exactly is Project Genie?
Jack: Project Genie is a tool where you can create your own world with characters and environments and explore them in real time. For example, journey to an alien planet, or dive underwater with sea creatures. Whatever you can think of.
Shlomi: The worlds that we typically want to simulate are variants of the world we live in because that’s what we know and care about. Genie is predicting what will happen based on the mechanics of such a variant world: "OK, if I'm going to walk into the room that looks like the image you provided of your room, how is it going to look when I walk around? How is the mirror going to look? How will the light reflect from the wooden floor?" All of those environment dynamics — if water is spilled or it rains — the model simulates end to end, with no game engine running in the background. And you can actually interact. If there's a ball on the floor, you can actually bump into it and it starts rolling, which is what you would expect to happen in reality. When the model is doing a good job, it looks realistic.
Is Genie the first world model?
Jack: There are actually lots of historical papers on world models, but one of the ones that popularized the idea is from 2018 from what was then called Google Brain. Google Brain was our deep learning and AI research team, and it’s now a part of Google DeepMind. That paper was by David Ha and Jürgen Schmidhuber; it was the first time someone trained a world model from a visual domain. That was what really popularized the term "world model" in the developer community.
What’s the difference between a world model and, say, a large language model?
Shlomi: Think about it like this: A language model is trying to predict the next word. From that, it learns a representation of the language. Later, we can teach it to have a full conversation with a person and even, maybe, think about a mathematical problem. Similarly, a world model tries to predict what's going to happen next in the world based on the sequence of actions that an agent is performing. Basically it’s simulating an entire environment, moment by moment, in reaction to an agent. Through this simple task the model learns a representation of the world.
So a world model predicts that world based on an environment it’s been trained on. And not only the world, but how things react in that world. Is that right?
Shlomi: Yes. A key piece of what’s happening in a world model is what we call "observation.” When we use this word, it has a narrow definition: visual observation. Observation more generally doesn’t have to be visual — you can observe how something tactile feels or the smell of something. But at this point in time, we’re talking about visuals.
Got it. How do you prompt Genie?
Jack: The best way to start prompting Genie is with an image or images — we often use Nano Banana for this — and some text. You can do just text, but it’s more entertaining to use a visual, too. For example, you can upload a photo of a dog on the beach, and the text can describe the dynamics of the scene — maybe something like how the sea is choppy.
What could we use world models for?
Jack: One application is training AI agents to learn how to do things in the real world. Giving them access to our actual world would be dangerous and costly, but if we could simulate it that would give us a testing ground. Another is education: You could use a world model to teach a classroom about science and history — imagine 35 kids in a class who aren't paying attention. Suddenly, the teacher brings up a world model on the board: "OK, we're going to walk around ancient Rome. What should we do? Let's go and ask that person what's happening." We can make sure the model is more historically accurate and make it an interactive experience. For science, you could explore underwater diving — we've got examples of this already.
That’s a field trip I’d like to go on.
Jack: Right! This new technology could be transformational for school, and vocational skills. Maybe I would have been amazing at disaster recovery, but how would I ever know? With this, you could get an idea of what it’s like to be an emergency responder or a firefighter without being put in danger or putting anyone else in danger. I see a lot of value in that.
Are there any other ways you think people will use this technology?
Shlomi: Genie is still a prototype but we've heard from early testers how they’re using it to explore game ideas they're excited about. And filmmakers would like to use it to try out new environments for their movies. We envision that later this will develop into a new media that blurs the lines between watching a film and playing a game. It would take us beyond passive watching into a more interactive space. Right now it's pretty basic, but it's just a glimpse of what can be done.