Understanding the world through language

May 11, 2022

Zoubin Ghahramani

Vice President, Google DeepMind

Language is at the heart of how people communicate with each other. It’s also proving to be powerful in advancing AI and building helpful experiences for people worldwide.

From the beginning, we set out to connect words in your search to words on a page so we could make the web’s information more accessible and useful. Over 20 years later, as the web changes, and the ways people consume information expand from text to images to videos and more — the one constant is that language remains a surprisingly powerful tool for understanding information.

In recent years, we’ve seen an incredible acceleration in the field of natural language understanding. While our systems still don’t understand language the way people do, they’re increasingly able to spot patterns in information, identify complex concepts and even draw implicit connections between them. We’re even finding that many of our advanced models can understand information across languages or in non-language-based formats like images and videos.

Building the next generation of language models

In 2017, Google researchers developed the Transformer, the neural network that underlies major advancements like MUM and LaMDA. Last year, we shared our thinking on a new architecture called Pathways, which is loosely inspired by the sparse patterns of neural activity in the brain. When you read a blog post like this one, only the critical parts of your brain needed to process this information fire up — not every single neuron. With Pathways, we’re now able to train AI models to be similarly effective.

Using this system, we recently introduced PaLM, a new model that achieves state-of-the-art performance on challenging language modeling tasks. It can solve complex math word problems, and answer questions in new languages with very little additional training data.

PaLM also shows improvements in understanding and expressing logic. This is significant because it allows the model to express its reasoning through words. Remember your algebra problem sets? It wasn’t enough to just get the right answer — you had to explain how you got there. PaLM is able to prompt a “Chain of Thought” to explain its thought process, step-by-step. This emerging capability helps improve accuracy and our understanding of how a model arrives at answers.

Flow chart for the difference between "Standard Prompting" and "Chain of Thought Prompting"

Translating the languages of the world

Pathways-related models are enabling us to break down language barriers in a way never before possible. Nowhere is this clearer than in our recently added support for 24 new languages in Google Translate, spoken by over 300 million people worldwide — including the first indigenous languages of the Americas. The amazing part is that the neural model did this using only monolingual text with no translation pairs — which allows us to help communities and languages underrepresented by technology. Machine translation at this level helps the world feel a bit smaller, while allowing us to dream bigger.

Unlocking knowledge about the world across modalities

Today, people consume information through webpages, images, videos, and more. Our advanced language and Pathways-related models are learning to make sense of information stemming from these different modalities through language. With these multimodal capabilities, we’re expanding multisearch in the Google app so you can search more naturally than ever before. As the saying goes — “a picture is worth a thousand words” — it turns out, words are really the key to sharing information about the world.

"Scene exploration" GIF of a store shelf demonstrating multisearch

Improving conversational AI

Despite these advancements, human language continues to be one of the most complex undertakings for computers.

In everyday conversation, we all naturally say “um,” pause to find the right words, or correct ourselves — and yet other people have no trouble understanding what we’re saying. That’s because people can react to conversational cues in as little as 200 milliseconds. Moving our speech model from data centers to run on the device made things faster, but we wanted to push the envelope even more.

Computers aren’t there yet — so we’re introducing improvements to responsiveness on the Assistant with unified neural networks, combining many models into smarter ones capable of understanding more — like when someone pauses but is not finished speaking. Getting closer to the fluidity of real-time conversation is finally possible with Google's Tensor chip, which is custom-engineered to handle on-device machine learning tasks super fast.

We’re also investing in building models that are capable of carrying more natural, sensible and specific conversations. Since introducing LaMDA to the world last year, we’ve made great progress, improving the model in key areas of quality, safety and groundedness — areas where we know conversational AI models can struggle. We’ll be releasing the next iteration, LaMDA 2, as a part of the AI Test Kitchen, which we’ll be opening up to small groups of people gradually. Our goal with AI Test Kitchen is to learn, improve, and innovate responsibly on this technology together. It’s still early days for LaMDA, but we want to continue to make progress and do so responsibly with feedback from the community.

Responsible development of AI models

While language is a remarkably powerful and versatile tool for understanding the world around us, we also know it comes with its limitations and challenges. In 2018, we published our AI Principles as guidelines to help us avoid bias, test rigorously for safety, design with privacy top of mind and make technology accountable to people. We’re investing in research across disciplines to understand the types of harms language models can affect, and to develop the frameworks and methods to ensure we bring in a diversity of perspectives and make meaningful improvements. We also build and use tools that can help us better understand our models (e.g., identifying how different words affect a prediction, tracing an error back to training data and even measuring correlations within a model). And while we work to improve underlying models, we also test rigorously before and after any kind of product deployment.

We’ve come a long way since introducing the world to the Transformer. We’re proud of the tremendous value that it and its predecessors have brought not only to everyday Google products like Search and Translate, but also the breakthroughs they’ve powered in natural language understanding. Our work advancing the future of AI is driven by something as old as time: the power language has to bring people together.

POSTED IN: