How we built Google Meet’s adaptive audio feature
At this year’s Cloud Next, the Google Meet team set up a demo room with several laptops lined up next to each other. Then the team brought customers into the space, and asked what they thought would happen if the laptops joined the same meeting at the same time.
“One of those customers was like, ‘Oh, it’s going to be the screams of a thousand dying souls,’” says Meet Product Manager Huib Kleinhout, referring to the howling echoes and audio feedback that comes from two nearby devices on the same call. “Then he tried it — and silence. He was like, ‘How does this work? What kind of hardware do I need?’ We picked up a $200 Chromebook and showed him it could work on any machine.”
The demo was for adaptive audio in Meet, a feature that transforms multiple laptops in close proximity into a unified audio system, eliminating the ear-splitting screeches that have plagued virtual meeting attendees for years. Adaptive audio is available to Google Workspace customers with a Gemini for Workspace add-on. Try it out with our no-cost 60-day Gemini trial, available until December 3, 2024.
The Meet team tests new features like adaptive audio in their audio lab.
The team started working on adaptive audio after the world switched to video conferencing and eventually, hybrid work due to the pandemic. At the time, it was challenging to get new meeting room hardware due to supply chain shortages. “Plus, many organizations didn’t have enough video conferencing rooms to begin with, or they didn’t have the resources for dedicated meeting room equipment,” Huib says.
Teams needed to be able to create ad-hoc meeting spaces and without the inconvenience of crowding around a single laptop. But enabling everyone to join from their own devices while silencing the “screams” is much harder than it sounds.
“Imagine a movie theater audio setup. You have multiple speakers around you, and it's a nice audio experience because they’re all cabled to the same sound source, so they play out in an intended synchronicity,” Meet Software Engineer Manager Henrik Lundin says. “Now, if you have several devices in the room playing the same audio without synchronization, it would sound horrible. You’re getting multiple copies of the same audio — like you’re standing in a large cathedral. And likewise, when you speak in a room with multiple microphones on different devices, they pick up sound at the same time, but they're not on the same clock.”
Then there’s the echo problem. You’ve probably noticed that you’ll sometimes get an echo of your own voice back when using video conferencing tools. “The reason that you don't get that all the time is because the devices that run meetings have an echo canceller inside,” Henrik says. “It's a signal processing algorithm that tries to figure out which part of the audio from the microphone signal is actually just coming from the speakers in the same device and which part of it is your voice. This gets 10x harder when you have multiple laptops in the same room playing the audio and feeding into each other’s microphones.”
To solve this audio puzzle, the team spent a lot of time getting in the same room and figuring out how to get their laptops to know they were next to each other. At first, they tested having people join specific preset groups within the meeting. “This was obviously error prone, but it helped us test out the experience of synchronizing all the laptops’ microphones and speakers,” Henrik says.
Then they tried using ultrasound. By emitting high-frequency sounds undetectable to the human ear, the laptops can identify the presence of other laptops in close proximity and begin acting together as a group. This eliminated the need for users to manually configure their devices or select the room they were in. “But it was really tricky because the ultrasound needed to work reliably on any device, and be precise — if audio leaks from the room next door, it shouldn’t think you’re in the same room,” Henrik says. The team adopted a new type of ultrasound to increase accuracy, and tuned the frequency and volume to optimize reach without being audible.
Once Meet detects multiple laptops are present, adaptive audio activates automatically, synchronizing all the laptops’ microphones and speakers without turning any speakers off. It switches between microphones depending on who’s talking to prevent feedback and echo. Additionally, Meet uses backend processing and a cloud denoiser to enhance audio quality and remove background noise before transmitting audio to other participants.
All across Google, meetings every day already use adaptive audio — many without participants even realizing it. “It’s one of those technologies that removes the cognitive load from the user. They don’t have to wonder if they’re in the right setup before they join a meeting,” Meet Interaction Design Lead Ahmed Aly says. “Regardless of how complex and marvelous the engineering behind it is, from the end user perspective, whenever they open their laptop and join a meeting, it just works.”
Looking ahead, the team continues to explore how they can help people connect more easily, especially when conferencing hardware or meeting rooms are unavailable. “We hope it provides more flexibility and improves meeting equity and participation,” Huib says. “You can be effective and represented — the camera and microphone are right in front of you, so you can be fully seen and heard, no matter where you’re seated.”