“L10n” - Localisation: Breaking down language barriers to unleash the benefits of the internet for all Indians
In July, at the Google for India event, we outlined our vision to make the Internet helpful for a billion Indians, and power the growth of India’s digital economy. One critical area that we need to overcome is the challenge of India’s vast linguistic diversity, with dialects changing every hundred kilometres. More often than not, one language doesn’t seamlessly map to another. A word in Bengali roughly translates to a full sentence in Tamil and there are expressions in Urdu which have no adequately evocative equivalent in Hindi.
This poses a formidable challenge for technology developers, who rely on commonly understood visual and spoken idioms to make tech products work universally.
We realised early on that there was no way to simplify this challenge - that there wasn’t any one common minimum that could address the needs of every potential user in this country. If we hoped to bring the potential of the internet within reach of every user in India, we had to invest in building products, content and tools in every popularly spoken Indian language.
India’s digital transformation will be incomplete if English proficiency continues to be the entry barrier for basic and potent uses of the Internet such as buying and selling online, finding jobs, using net banking and digital payments or getting access to information and registering for government schemes.
The work, though underway, is far from done. We are driving a 3-point strategy to truly digitize India:
- Invest in ML & AI efforts at Google’s research center in India, to make advances in machine learning and AI models accessible to everyone across the ecosystem.
- Partner with innovative local startups who are building solutions to cater to the needs of Indians in local languages
- Drastically improve the experience of Google products and services for Indian language users
And so today, we are happy to announce a range of features to help deliver an even richer language experience to millions across India.
Easily toggling between English and Indian language results
Four years ago we made it easier for people in states with a significant Hindi-speaking population to flip between English and Hindi results for a search query, by introducing a simple ‘chip’ or tab they could tap to see results in their preferred language. In fact, since the launch of this Hindi chip and other language features, we have seen more than a 10X increase in Hindi queries in India.
We are now making it easier to toggle Search results between English and four additional Indian languages: Tamil, Telugu, Bangla and Marathi.
People can now tap a chip to see Search results in their local language
Understanding which language content to surface, when
Typing in an Indian language in its native script is typically more difficult, and can often take three times as long, compared to English. As a result, many people search in English even if they really would prefer to see results in a local language they understand.
Search will show relevant results in more Indian languages
Over the next month, Search will start to show relevant content in supported Indian languages where appropriate, even if the local language query is typed in English. This functionality will also better serve bilingual people who are comfortable reading both English and an Indian language. It will roll out in five Indian languages: Hindi, Bangla, Marathi, Tamil, and Telugu.
Enabling people to use apps in the language of their choice
Just like you use different tools for different tasks, we know (because we do it ourselves) people often select a specific language for a particular situation. Rather than guessing preferences, we launched the ability to easily change the language of Google Assistant and Discover to be different from the phone language. Today in India, more than 50 percent of the content viewed on Google Discover is in Indian languages. A third of Google Assistant users in India are using it in an Indian language, and since the launch of Assistant language picker, queries in Indian languages have doubled.
Maps will now able people to select up to nine Indian languages
We are now extending this ability to Google Maps, where users can quickly and easily change their Maps experience into one of nine Indian languages, by simply opening the app, going to Settings, and tapping ‘App language’. This will allow anyone to search for places, get directions and navigation, and interact with the Map in their preferred local language.
Homework help in Hindi (and English)
Meaning is also communicated with images: and this is where Google Lens can help. From street signs to restaurant menus, shop names to signboards, Google Lens lets you search what you see, get things done faster, and understand the world around you—using just your camera or a photo. In fact more people use Google Lens in India every month than in any other country worldwide. As an example of its popularity, over 3 billion words have been translated in India with Lens in 2020.
Lens is particularly helpful for students wanting to learn about the world. If you’re a parent, you’ll be familiar with your kids asking you questions about homework. About stuff you never thought you’d need to remember, like... quadratic equations.
Google Lens can now help you solve math problems by simply pointing your camera
Now, right from the Search bar in the Google app, you can use Lens to snap a photo of a math problem and learn how to solve it on your own, in Hindi (or English). To do this, Lens first turns an image of a homework question into a query. Based on the query, we will show step-by-step guides and videos to help explain the problem.
Helping computer systems understand Indian languages at scale
At Google Research India, we have spent a lot of time helping computer systems understand human language. As you can imagine, this is quite an exciting challenge.The new approach we developed in India is called Multilingual Representations for Indian Languages (or ‘MuRIL’). Among many other benefits of this powerful multilingual model that scales across languages, MuRIL also provides support for transliterated text such as when writing Hindi using Roman script, which was something missing from previous models of its kind.
One of the many tasks MuRIL is good at, is determining the sentiment of the sentence. For example, “Achha hua account bandh nahi hua” would previously be interpreted as having a negative meaning, but MuRIL correctly identifies this as a positive statement. Or take the ability to classify a person versus a place: ‘Shirdi ke sai baba’ would previously be interpreted as a place, which is wrong, but MuRIL correctly interprets it as a person.
MuRIL currently supports 16 Indian languages as well as English -- the highest coverage for Indian languages among any other publicly available model of its kind.
MuRIL is free & Open Source, available on TensorFlow Hub.
We are thrilled to announce that we have made MuRIL open source, and it is currently available to download from the TensorFlow Hub, for free. We hope MuRIL will be the next big evolution for Indian language understanding, forming a better foundation for researchers, students, startups, and anyone else interested in building Indian language technologies, and we can’t wait to see the many ways the ecosystem puts it to use.
We’re sharing this to provide a flavor of the depth of work underway -- and which is required -- to really make a universally potent and accessible Internet a reality. This said, the Internet in India is the sum of the work of millions of developers, content creators, news media and online businesses, and it is only when this effort is undertaken at scale by the entire ecosystem, that we will help fulfil the truly meaningful promise of the billionth Indian coming online.