Skip to main content
Australia Blog

A partnership with The University of Western Australia to improve speech technology for Aboriginal and Torres Strait Islander people's voices

Glenys Collard, Dr Ben Hutchinson, and Assoc Prof Celeste Rodriguez Louro

Today, Google Australia together with Language Lab at the University of Western Australia (UWA) announced a multi-year project to support and improve Aboriginal and Torres Strait Islander peoples’ interactions with technology; specifically, Automated Speech Recognition (ASR).

The first-of-its-kind collaboration in Australia, and the next partnership under the Digital Future Initiative, aims to build a high-quality speech dataset of Aboriginal English. The project will allow UWA to take the lead in setting up Indigenous Data Governance for the dataset, while establishing a framework for responsible data collection, ensuring the data benefits local communities. UWA has set up Indigenous Governance for the project and has established a framework for culturally sensitive data collection. The University of Western Australia will retain ownership of the dataset and licensing and serve as stewards of the collection for its appropriate use.

Advancing speech recognition technology

ASR technologies are used in many Google products, including the Google Assistant, voice search, automated message dictation, and automated video transcription. External studies and Google’s own research have found that speakers of non-majority varieties of English often have a worse experience using products powered by ASR technology when compared to speakers of mainstream Englishes — underscoring the need for making ASR more inclusive.

In Australia, many Aboriginal and Torres Strait Islander people speak Aboriginal English. This variety of English has distinct linguistic features, such as pronunciation, lexical items, grammatical structures, and discourse markers. Many speakers of Aboriginal English also speak mainstream Australian English, and might switch between different varieties based on context and who they are talking to. We've learned that people may change their naturalistic voice patterns in order to be understood by voice products. This is called “code switching”, where people move from one linguistic variety to another to accommodate to who they are addressing, when they are talking, and for what purpose.

Previous studies have found that when people change their voices in order for computers to understand them, however, it may cause them to feel that the technology isn't designed for them. They can also blame themselves for the technology failing, and experience drops in self-esteem and increases in self-consciousness. They also have less favourable ratings of the technology.

My collaborator Glenys Collard, a First Nations scholar of the Nyungar Nation in southwest Western Australia, notes that Aboriginal English plays an important role in encoding First Nations social and cultural identities in Australia. Asking First Nations people to speak differently is asking them to suppress part of who they are. Collard has long advocated for greater use of Aboriginal English in education and public health, to create more inclusive spaces for First Nations peoples. We are now working together to bring this vision to digital spaces by improving ASR.

Improving speech technologies for speakers of Aboriginal English raises many research questions. What are Aboriginal English speakers' current experiences with speech technologies? How would speakers like the technology to perform? ASR technologies are only as good as the data they are trained on. How does variation in Aboriginal English interact with system performance?

We realised a novel approach was necessary to build a high-quality dataset of Aboriginal English for improving First Nation users’ experiences with products using ASR technology.

Reaching out to the community

To collect a rich corpus of Aboriginal English, we engaged with community members using tried-and-tested sociolinguistic research methods. Some of the questions we were interested in included, ‘What is the safest way to collect an interactional corpus of Aboriginal English in a way that is culturally safe for speakers?’ and ‘Who will own the data and make decisions about its future?’ We knew that addressing these questions was important to ensure community trust in the project.

To address these challenges, we turned to Glenys Collard and Assoc Prof Celeste Rodriguez Louro, of the Language Lab at the University of Western Australia (UWA). They have years of experience together conducting sociolinguistic research on Aboriginal English. Following Glenys Collard’s lead, they developed and used techniques for collecting speech data that are uniquely designed to suit ways of being and doing by speakers of Aboriginal English. These techniques revolve around two aspects of interaction which matter to First Nations people: storytelling and group settings. Led by an Aboriginal fieldworker, these sessions elicit naturalistic language in Aboriginal English. For this project, however, and to protect the community's culture and heritage, we stayed away from unstructured yarning sessions. Instead, we engaged the services of an Aboriginal-owned company which designed culturally appropriate visual prompts to elicit yarning on everyday topics such as fishing or going to the shops.

To advise us on cultural safety and data governance, we set up an Indigenous Advisory Committee. We also submitted our project to an extensive ethics process at UWA, which included detailed data collection plans, notes on benefits for First Nations communities, and submitting letters of support from First Nations community members. One of our key premises is that, in addition to improving technology for First Nations people, the project also seeks to build capacity for First Nations researchers and communities, offering employment opportunities and remunerating people for their work.

Google noted that captioning on YouTube-which uses ASR technology-also presented issues for Aboriginal English speakers. We began our research project by studying Aboriginal English publicly accessible on YouTube. We found that YouTube videos containing Aboriginal English encompass a wide variety of domains and genres, providing further evidence that Aboriginal English is widely used in online domains. Experts in the linguistic features of Aboriginal English watched and listened to over 100 hours of YouTube videos. The majority of the videos were unscripted, including conversations between Aboriginal English speakers and speakers of mainstream Australian English. These videos enabled us to start co-designing transcription guidelines, which will continue to grow along with our dataset. In our study of captions, we found serious misalignment between the speech depicted on video and its captions. Our aim with this portion of the study is to ensure that speakers of Aboriginal English have access to reliable captioning, just like users of mainstream Englishes.

Innovation through partnership

Google’s responsibilities will include evaluating and improving ASR models, providing infrastructure for data collection, funding and obtaining a licence to use the data commercially. The University of Western Australia will lead data collection efforts, interview community members, manage vendors, recruit participants and maintain the repository.

We're solving the pipeline problem that plagues traditional speech collection. This project is not just focusing on data collection, transcriptions, or outputs, it’s leveraging community-centred methods and expertise in a culturally safe manner.

To ensure the safety of participant data, Google and UWA have conducted research to determine best practices for speech collection and identifying as many risks as possible.

My collaborator Assoc Prof Rodriguez Louro notes that the methods used in previous projects (i.e., unstructured yarning sessions) are not appropriate in this context. Cultural safety is the top priority, and previous speech data collections on Aboriginal English were not designed with technology applications in mind. We know the benefits of in-person and Aboriginal-led data collection, but we are adopting strict language elicitation methods (including visual prompts curated by an Aboriginal design company) to safeguard sensitive or sacred topics. We also have in place a strict protocol for the transcription and use of these data, with each transcript not only anonymised to protect people’s identities (as is required by our Research Ethics Office) but also by deleting any spoken / transcribed text which mentions people, places, dates, or other cultural information unsafe for use by others.

The Aboriginal English Voices project has taken its first steps. We have a clear vision of a more inclusive future of language technologies, one that centres Indigenous perspectives and ways of being

Our goal is to broaden the range of linguistic varieties used to train ASR systems by employing established sociolinguistic research methods while maintaining flexibility to respect participants' rich cultural traditions, adapt to the evolving technological landscape, and stay committed to advancing inclusive Indigenous Futures.