These companies are shrinking the voice recognition ‘accent gap’

Nitin Naresh August 11, 2018

0 408 4 minutes read

Speech recognition has come a long way since IBM’s Shoebox machine and Worlds of Wonder’s Julie doll. By the end of 2018, the Google Assistant will support over 30 languages. Qualcomm has developed on-device models that can recognize words and phrases with 95 percent accuracy. And Microsoft’s call center solution is able to transcribe conversations more accurately than a team of humans.
But despite the technological leaps and bounds made possible by machine learning, the voice recognition systems of today are at best imperfect — and at worst discriminatory. In a recent study commissioned by the Washington Post, popular smart speakers made by Google and Amazon were 30 percent less likely to understand non-American accents than those of native-born users. And corpora like Switchboard, a dataset used by companies such as IBM and Microsoft to gauge the error rates of voice models, have been shown to skew measurably toward speakers from particular regions of the country.
“Data is messy because data is reflective of humanity,” Rumman Chowdhury, global responsible AI lead at Accenture, told VentureBeat in an interview, “and that’s what algorithms are best at: finding patterns in human behavior.”
It’s called algorithmic bias: the degree to which machine learning models reflect prejudices in data or design. Countless reports have demonstrated the susceptibility of facial recognition systems — notably Amazon Web Service’s Rekognition — to bias. And it has been observed in automated systems that predict whether a defendant will commit future crimes, and even in the content recommendation algorithms behind apps like Google News.
Microsoft and other industry leaders such as IBM, Accenture, and Facebook have developed automated tools to detect and mitigate bias in AI algorithms, but few have been particularly vocal (pun intended) about solutions specific to voice recognition.
One that has is Speechmatics. Another is Nuance.

Addressing the ‘accent gap’

Speechmetrics, a Cambridge tech firm that specializes in enterprise speech recognition software, embarked 12 years ago on an ambitious plan to develop a language pack more accurate and comprehensive than any on the market.
It would have its roots in statistical language modeling and recurrent neural networks, a type of machine learning model that can process sequences of outputs in memory. In 2014, it made a baby step toward its vision with a billion-word corpus for measuring progress in statistical language modeling, and in 2017, it reached another milestone: a partnership with the Qatar Computing Research Institute (QCRI) to develop Arabic speech-to-text services.
“We realized that we [needed] to come up with what we like to call ‘one model to rule them all’ — an accent-agnostic language pack that is just as accurate at transcribing [an] Australian accent as it is with Scottish,” Speechmatics CEO Benedikt von Thüngen said.
They succeeded in July of this year. The language pack — dubbed Global English — is the result of thousands of hours of speech data from over 40 countries and “tens of billions” of words. It supports “all major” English accents for speech-to-text transcription, and it’s built on the back of Speechmatic’s Automatic Linguist, an AI-powered framework that learns the linguistic foundations of new languages by drawing on patterns identified in known ones.
“Say you have an American on one side of the conversation and an Australian on the other, but the American lived in Canada and picked up a Canadian accent,” Ian Firth, vice president of products at Speechmatics explained in an interview. “Most systems have a difficult time handling those types of situations, but ours doesn’t.
In tests, Global English has outperformed accent-specific language packs in Google’s Cloud Speech API and the English language pack in IBM’s Cloud. Thüngen claims that on the high end, it’s between 23 percent and 55 percent more accurate.
Speechmatics isn’t the only company claiming to have narrowed the accent gap.
Burlington, Massachusetts-based Nuance says it employs several methods to ensure its voice recognition models understand equally well speakers of the roughly 80 languages its products support.
For its UK voice model, it sourced 20 defined dialect regions and included words particular to each dialect (i.e., using the word “cob” to refer to a bread roll), along with their pronunciations. The resulting language pack recognizes 52 different variations of the word “Heathrow.”
But it went one step further. Newer versions of Dragon, Nuance’s bespoke speech-to-text software suite, employs a machine learning model that switches automatically between several different dialect models depending on the users’ accent. Compared to older versions of the software without the model-switching neural network, it performs 22.5 percent better for English speakers with a Hispanic accent, 16.5 percent better for southern U.S. dialects, and 17.4 better for Southeast Asian speakers of English.

The more data, the better

Ultimately, the accent gap in voice recognition is a data problem. The higher the quantity and diversity of speech samples in a corpus, the more accurate the resulting model — at least in theory.
In the Washington Post’s test, Google Home speakers were 3 percent less likely to give accurate responses to people with Southern accents than those with Western accents, and Amazon’s Echo devices performed 2 percent worse with Midwest inflections.
An Amazon spokesperson told the Washington Post that Alexa’s voice recognition is continually improving over time, as more users speak to it with various accents. And Google in a statement pledged to “continue to improve speech recognition for the Google Assistant as we expand our datasets.”
Voice recognition systems will on some level improve as more people begin to use them regularly — nearly 100 million smart speakers will be sold globally by 2019, according to market research firm Canalys, and roughly 55 percent of U.S. households will own one by 2022.
Just don’t expect a silver bullet.
“With today’s technology, you’re not going to have the most accurate speech for every use case in the entire world,” Firth said. “The best you can do is make sure the accuracy is good for people who are trying to use it.”
Source: VentureBeat

These companies are shrinking the voice recognition ‘accent gap’

Addressing the ‘accent gap’

The more data, the better

Nitin Naresh

Read Next

America Has Time For Ukraine And Gaza, But Not For Its Own Fellas- A Tale Of Misplaced Priorities Of The Most Powerful Country In The World!

How Affordable Housing Is Declining In India?

India Toxic Air Crisis: Why Bryan Johnson Says It’s Deadlier Than Cancer?

70,000 Monsters in Plain Sight: How Telegram Account Made Dominique Pelicot’s Horrific Crime Possible

The More Things Change: India’s Endless Dance with Colonialism

The Great Career Catfishing: How Gen Z Is Teaching Us To Live (And Ghost) Again

Shocking Turn In Techie Suicide Case: Did Karnataka HC’s Bail Decision Betray Justice?

Unveiling Rule 14: The Game-Changer In National Security And Digital Privacy

Top 10 Most Promising LinkedIn Influencers In 2025

The Great Indian Exodus: Understanding Why The Wealthy Are Seeking New Horizons

America Has Time For Ukraine And Gaza, But Not For Its Own Fellas- A Tale Of Misplaced Priorities Of The Most Powerful Country In The World!

How Affordable Housing Is Declining In India?

India Toxic Air Crisis: Why Bryan Johnson Says It’s Deadlier Than Cancer?

70,000 Monsters in Plain Sight: How Telegram Account Made Dominique Pelicot’s Horrific Crime Possible

The More Things Change: India’s Endless Dance with Colonialism

The Great Career Catfishing: How Gen Z Is Teaching Us To Live (And Ghost) Again

Shocking Turn In Techie Suicide Case: Did Karnataka HC’s Bail Decision Betray Justice?

Unveiling Rule 14: The Game-Changer In National Security And Digital Privacy

Top 10 Most Promising LinkedIn Influencers In 2025

The Great Indian Exodus: Understanding Why The Wealthy Are Seeking New Horizons

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Addressing the ‘accent gap’

The more data, the better

Read Next

America Has Time For Ukraine And Gaza, But Not For Its Own Fellas- A Tale Of Misplaced Priorities Of The Most Powerful Country In The World!

How Affordable Housing Is Declining In India?

India Toxic Air Crisis: Why Bryan Johnson Says It’s Deadlier Than Cancer?

70,000 Monsters in Plain Sight: How Telegram Account Made Dominique Pelicot’s Horrific Crime Possible

The More Things Change: India’s Endless Dance with Colonialism

The Great Career Catfishing: How Gen Z Is Teaching Us To Live (And Ghost) Again

Shocking Turn In Techie Suicide Case: Did Karnataka HC’s Bail Decision Betray Justice?

Unveiling Rule 14: The Game-Changer In National Security And Digital Privacy

Top 10 Most Promising LinkedIn Influencers In 2025

The Great Indian Exodus: Understanding Why The Wealthy Are Seeking New Horizons

Subscribe to our mailing list to get the new updates!

Openbook is the latest dream of a digital life beyond Facebook

Here’s why you’re being kept in the dark about how your data is used

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future