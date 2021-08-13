



Today, Google learns more about SoundStream, an end-to-end neural audio codec that can provide high-quality voice while encoding different types of voice, including clean voice, noisy reverberant voice, music, and ambient sound. explained. The company claims that it is the first AI-powered codec to run on voice and music while simultaneously running in real time on the smartphone’s processor.

Audio codecs compress audio to reduce the need for high storage and bandwidth requirements. Ideally, the decoded audio should be perceptually indistinguishable from the original audio and have little latency. Most codecs leverage domain expertise and a carefully designed signal processing pipeline, but I was interested in replacing handmade specifications with AI that could learn to encode on the fly.

Earlier this year, Google released Lyra, a neural audio codec trained to compress low bitrate audio. SoundStream extends this work with a system consisting of encoders, decoders, and quantizers. The encoder converts the audio into a coded signal. The coded signal is compressed using a quantizer and converted to audio using a decoder. Once the training is complete, the encoder and decoder can be run on separate clients to send audio over the internet, and the decoder can operate at any bit rate.

Audio compression

In traditional audio processing pipelines, compression and expansion, or background noise removal, is typically performed by various modules. However, SoundStream is designed to perform compression and expansion at the same time. At 3kbps, SoundStream outperforms the popular Opus codec at 12kbps, approaches EVS quality at 9.6kbps, and uses 3.2 to a quarter of the bits, according to Google. In addition, SoundStream performs better than the current version of Lyra when compared at the same bitrate.

Browse the audio before processing it with SoundStream.

https://venturebeat.com/wp-content/uploads/2021/08/soundstream_noisy_speech_reference.wav

And the processed audio is:

https://venturebeat.com/wp-content/uploads/2021/08/soundstream_noisy_speech_ss3kbps.wav

Google warns that SoundStream is still . However, the company plans to release an updated version of Lyra that incorporates the components to achieve both higher audio quality and reduced complexity.

Efficient compression is required whenever audio needs to be transmitted, whether during video streaming or conference calls. SoundStream is an important step in improving machine learning-driven audio codecs. Better than state-of-the-art codecs like Opus and EVS, it can enhance audio on demand, and Google research scientist Neil Zeghidour and staff research Marco Tagliasacchi blogged. By integrating SoundStream with Lyra, developers can leverage existing Lyra APIs and tools to work with both flexibility and superior sound quality.

