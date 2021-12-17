



Edward Miller is the founder and chief executive officer of LumenVox and is at the forefront of the company’s strategic vision.

In this mysterious era of Google Home, the voices of Siri and Alexa were under our command, and the technological miracle of automatic speech recognition (ASR) brought a new level of sophisticated application. These tools have advanced us not only to improve our daily lives, but also to commercial practicality and feasibility in a wide range of fields.

From closed captioning and hands-free computing, which translates spoken language into text in the form of a virtual assistant, to a remarkable new ASR engine that provides the latest toolsets such as voice biometrics, call progress analysis, and voice recognition capabilities. Is expanding its language model innovatively and dramatically to serve an increasingly diverse and growing user base.

Emphasize innovation. Built on the evolving foundation of artificial intelligence and machine learning, the new ASR company provides a new and significantly higher level of precision and intelligence needed to better capture, recognize and respond to user and customer intent. Software that provides and defines speech recognition and the possibilities of speech recognition.

Disparity in speech recognition ability

However, there is a problem with understanding ASR. Today’s speech recognition systems suffer from imperfections at best, and at worst are discriminatory in different regions and other dialects, accents, patois, brogues, draws and lits.

The Washington Post recently convened two research groups to investigate the “accent imbalance” of smart speakers, testing thousands of voice commands directed by more than 100 people across 20 cities. did. They said the system “showed a marked disparity in how people in different parts of the United States are understood.”

Studies show that people with Southern accents are 3% less likely to receive accurate responses from Google Home devices than people with Western accents, and Alexa is more Midwestern than people with East Coast intonation. I understood the accent of 2% less.

It’s not a big gap, but as the article says, “People with non-native accents faced the biggest setbacks. In one study comparing what Alexa thought he heard with what the test group actually said. , The system showed that speeches from that group showed about 30% of inaccuracies. ”For example, consider a speaker in Spanish, the Middle East, or Asia.

According to a recent important study in the minutes of the National Academy of Sciences, the ASR system, which uses advanced machine learning algorithms to convert speech languages ​​to text, “strengthens the increasingly popular and popular virtual assistants. A digital dictation platform for healthcare that promotes automatic closed captioning. Over the past few years, the quality of these systems has been improved both by deep learning advances and the collection of large datasets used to train the systems. , Dramatically improved. “

However, their study has revealed deep concern that these tools may not work equally well for all subgroups of the population. The group consists of five state-of-the-art ASR systems developed by Amazon, Apple, Google, IBM, and Microsoft, spanning five cities in the United States, with speakers that match nearly 20 hours of audio. We investigated the ability to copy a chemical interview. Age and gender. “We found that all five ASR systems showed significant discrepancies, with an average word error rate (WER) for one group of speakers being 0.35 compared to 0.19 for the other speakers.”

This shows an important need for continuous and enhanced innovation to address dialect and other voice discrepancies in order for these ASR solutions to support and better include the widest range of users. It is sex.

The Scientific American article suggests that there are likely to be many criminals explaining the inequality, but strongly suggests that training data is most likely. The authors of the article suggest that the “standards” used entirely for training speech recognition techniques are primarily from one section of the population.

“By using a narrow speech corpus in both the words used and their wording, the system excludes accents and other ways of speaking, especially with unique linguistic features. The disparities found in this study are: Primarily because some people are twice as likely to be misunderstood as other speakers, even if the speaker says the same phrase. “

Use innovative things to increase inclusiveness

Leading companies need to drill down to derive and develop innovative solutions for increased inclusiveness. By extending the ASR engine grammar, a set of word patterns that tell the ASR system what humans expect, companies can go beyond the spelling of speech to include custom pronunciations of specific dialects. .. These pronunciations can be grouped into a single file (or lexicon) and referenced from within the ASR engine grammar. These innovations will enable any company to create a speech recognition system that can serve a vast, broad, mixed and diverse user base in a single model.

These innovations will allow any company to create a voice recognition system that can serve a vast, broad, mixed and diverse user base in a single model, evolving unobtrusive network architectures, financial institutions and call centers. You will be able to promote other hybrid environments, including. more.

