Today, Google and Howard University announced a new partnership called Project Elevate Black Voices. Project EBV is a first-of-its-kind collaboration to build a high-quality African-American English (AAE) speech dataset. The project will allow Howard University to share the dataset with those looking to improve speech technology while establishing a framework for responsible data collection, ensuring the data benefits Black communities. Howard University will retain ownership of the dataset and licensing and serve as stewards for its responsible use.
Advancing speech recognition technology
External studies and Google’s own research have found that Black people in the United States often have a worse experience using automatic speech recognition technology (ASR) when compared to white speakers — underscoring the need for a technical solution to word error rates.
We've learned that Black users in particular are changing their voice patterns away from AAE in order to be understood by voice products. This is called “code switching” in the community, where people lean away from their accents.
My colleague Dr. Gloria Washington, Associate Professor at Howard University and Principal Investigator for Project EBV, notes that by constantly accommodating their speech to technology, Black users often feel left out or “othered” on a daily basis. “The most important part of this is showing that at a base level, however you speak is celebrated and that you can actually use these voice assistant tools daily,” she says.
I found common ground with Dr. Washington during our initial conversations about Project EBV’s goals. We realized that we’re looking to tackle similar issues: Why do AAE speakers have to code switch with voice technology? Can we understand how these variations interact with speech technologies? How can we improve our tools and make them perform in ways the Black community can be proud of?
We identified a number of barriers to improving automatic speech recognition (ASR) performance. One issue was the lack of natural AAE speech found within speech data. Because Black users have been implicitly conditioned to change their voices when using ASR-based technology, the in-product data rarely contains organic speech.
Security and user privacy policies also serve as a self-imposed, positive constraint to collecting AAE speech data. Even when there is data available, in-product AAE data is really difficult to leverage. Although we’ve made strides to identify AAE data using dialect classifiers to start improving technology, code-switching makes AAE data underrepresented and insufficient to address the challenge.
We realized a novel approach was necessary to build a new, high-quality dataset of unaccommodated AAE for improving Black users’ experiences with ASR technology.
Reaching out to the community
Our team member, Darryl Wright, found that improving speech recognition experience needed more data that’s actually representative of the user base. That’s been a challenge for tech companies given concerns about trust. “You don't want to go into an underserved community without understanding their challenges with the technology,” Wright says. “You instead want to collectively reimagine what these systems could look like when their voices are front and center."
We turned to Howard University, which is filled with world-renowned linguists and scientists, and has a deep understanding of Black Americans nationally. Most of the linguistics and history teachers who had a stake in solidifying this as a viable field were already practicing at Howard. “Howard has a history in this space of making not only students, but also Black people feel comfortable being themselves,” Dr. Washington says.
Project EBV is unprecedented because the leaders of this data collection are part of the community. Whereas in the past, data collection for underrepresented groups took place through third-party vendors — who weren’t always well versed in the culture.
Dr. Washington and the team at Howard are working closely with transcription vendors and AAE experts from Howard, like Dr. David Green.
“Translation guides will be important for understanding the nuances and differences of Black speech,” says Dr. Green. “AAE is considered something that is specific to a community. It’s not something people always want to share, at least not consciously. We’re helping participants draw from certain experiences and also how to bring out different regional dialects.”
Responsible innovation through partnership
Google’s responsibilities will include performing user experience research to collect input from the community, providing infrastructure for data collection, funding and obtaining a license to use the data commercially. Howard will lead the coordination effort with other HBCUs, manage vendors, recruit participants and maintain the repository.
We're solving the pipeline problem that plagues traditional speech collection. This project is not just focusing on data collection, or transcriptions, or the output, it’s leveraging community-centered methods and expertise.
To ensure the safety of participant data, Google is conducting research to determine best practices for speech collection and identifying as many risks as possible, and aligning with Google’s AI Principles which directly inspired Howard's own set of best practices.
“I see this becoming a whole consortium of HBCUs that work together to understand and improve technology for Black people in a respectful way, and show that you can create technology that celebrates Black voices,” says Dr. Washington.
Project EBV is in its infancy and has just begun to take its first steps, but the big-picture goals and Dr. Washington’s vision guides the team.
As owners of the dataset, Howard will take the lead on its licensing and sharing with other parties who wish to contribute to this effort. Google also hopes to use the dataset to improve its own products, ensuring that our tools work for more people.
In the end, the ultimate goal is allowing people to voice technology and express themselves authentically.