110 new languages are coming to Google Translate
Google Translate breaks down language barriers to help people connect and better understand the world around them. We’re always applying the latest technologies so more people can access this tool: In 2022, we added 24 new languages using Zero-Shot Machine Translation, where a machine learning model learns to translate into another language without ever seeing an example. And we announced the 1,000 Languages Initiative, a commitment to build AI models that will support the 1,000 most spoken languages around the world.
Now, we’re using AI to expand the variety of languages we support. Thanks to our PaLM 2 large language model, we’re rolling out 110 new languages to Google Translate, our largest expansion ever.
Translation support for more than half a billion people
From Cantonese to Qʼeqchiʼ, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population. Some are major world languages with over 100 million speakers. Others are spoken by small communities of Indigenous people, and a few have almost no native speakers but active revitalization efforts. About a quarter of the new languages come from Africa, representing our largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof. The list includes 7 new Indian languages - Awadhi, Bodo, Khasi, Kokborok, Marwadi, Santali, and Tulu.
How we choose language varieties
There’s a lot to consider when adding new languages to Translate — everything from what varieties we offer, to what specific spellings we use.
Languages have an immense amount of variation: regional varieties, dialects, different spelling standards. In fact, many languages have no one standard form, so it’s impossible to pick a “right” variety. Our approach has been to prioritize the most commonly used varieties of each language. For example, Romani is a language that has many dialects all throughout Europe. Our models produce text that is closest to Southern Vlax Romani, a commonly used variety online. But it also mixes in elements from others, like Northern Vlax and Balkan Romani.
PaLM 2 was a key piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other, including languages close to Hindi, like Awadhi and Marwadi, and French creoles like Seychellois Creole and Mauritian Creole. As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time.
Visit the Help Center to learn more about these newly supported languages. And get started translating at translate.google.com or on the Google Translate app on Android and iOS.