Noise-canceling headphones are very good at creating an auditory blank slate, but being able to cancel out certain sounds around the wearer remains a challenge for researchers. The latest version of Apple's AirPods Pro, for example, senses whether the wearer is talking and automatically adjusts the volume, but gives the user little control over who they listen to and when.

Researchers at the University of Washington have developed an artificial intelligence system that allows headphone-wearing users to enroll a speaker by simply looking at them for three to five seconds. Called “Target Speech Hearing,” the system then cancels out all other surrounding sounds and plays only the speaker's voice in real time, even if the listener moves around in a noisy environment and no longer sees the speaker.

The research team presented their findings at the ACM CHI conference on Human Factors in Computing Systems in Honolulu on May 14. The code for the proof-of-concept device has been made publicly available for others to use. The system is not commercially available.

“When we think of AI, we tend to think of web-based chatbots that answer questions,” said Shyam Gollakota, a professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington and senior author of the paper. “But in this project, we're developing an AI that can modify the hearing of the person wearing the headphones based on their preferences. Our device can help a person hear one speaker clearly, even in a noisy environment with many people talking.”

To use the system, a person wearing commercial headphones with a microphone taps a button while facing the person speaking. Sound waves from the speaker's voice then reach both microphones in the headset at the same time, with a 16-degree tolerance. The headphones send the signal to an on-board computer, where the team's machine learning software learns the desired speaker's voice patterns. The system picks up that speaker's voice and continues to play it for the listener, even as the two people move around. As the speaker continues to speak, the system's ability to focus on the registered voice improves and provides the system with more training data.

The researchers tested the system on 21 subjects, who rated the enrolled speakers' voices as, on average, nearly twice as clear as the unfiltered voices.

The research builds on previous semantic hearing studies the team has conducted, which allowed users to select specific sound classes, such as birds or voices, that they wanted to hear and cancel out other sounds in their environment.

Currently, the TSH system can only register one speaker at a time, and can only register a speaker if there is no other loud voice coming from the same direction as the target speaker's voice. If users are not satisfied with the sound quality, they can re-register the speaker to improve intelligibility.

The team hopes to extend the system to earphones and hearing aids in the future.

Co-authors on the paper are UW Allen PhD students Bandhav Veluri, Malek Itani and Tuochao Chen, and Takuya Yoshioka, director of research at AssemblyAI. This research was funded by a Moore Inventor Fellow award, a Thomas J. Cabel Endowed Professorship and the UW CoMotion Innovation Gap Fund.

For more information, please contact [email protected].

