Amazon scientist explains how Alexa resolves ambiguous requests

Nitin Naresh September 28, 2018

0 370 2 minutes read

During a blockbuster press event last week, Amazon took the wraps off a redesigned Echo Show, Echo Plus, and Echo Spot, and nine other new other voice-activated accessories, peripherals, and smart speakers powered by Alexa. Also in tow: the Alexa Presentation Language, which lets developers build “multimodal” Alexa apps — skills — that combine voice, touch, text, images, graphics, audio, and video in a single interface.
Developing the frameworks that underlie it was easier said than done, according to Amazon senior speech scientist Vishal Naik. In a blog post today, he explained how Alexa leverages multiple neural networks — layered math functions that loosely mimic the human brain’s physiology — to resolve ambiguous requests. The work is also detailed in a paper (“Context Aware Conversational Understanding for Intelligent Agents with a Screen“) that was presented earlier this year at the Association for the Advancement of Artificial Intelligence.
“If a customer says, ‘Alexa, play Harry Potter,’ the Echo Show screen could display separate graphics representing a Harry Potter audiobook, a movie, and a soundtrack,” he explained. “If the customer follows up by saying ‘the last one,’ the system must determine whether that means the last item in the on-screen list, the last Harry Potter movie, or something else.”
Naik and colleagues evaluated three bidirectional long short term memory neural networks (BiLSTM) — a category of recurrent neural network that’s capable of learning long-term dependencies — with slightly different architectures. (Basically, the memory cells in LSTMs allow the neural networks to combine their memory and inputs to improve their prediction accuracy, and because they’re bidirectional, they can access context from both past and future directions.)
Sourcing data from the Alexa Meaning Representation Language, an annotated semantic-representation language released in June of this year, the team jointly trained the AI models to classify commands by either intent, which designates the action a customer wants Alexa to take, or slot, which designates the entities (i.e., an audiobook, movie, or smart home device trigger) the intent acts on. And they fed them embeddings, or mathematical representations of words.
The first of the three neural networks considered both the aforementioned embeddings and the type of content that would be displayed on Alexa devices with screens (in the form of a vector) in its classifications. The second went a step further, taking into account not just the type of on-screen data, but the specific name of the data type (e.g., “Harry Potter” or “The Black Panther” in addition to “Onscreen_Movie”). The third, meanwhile, used convolutional filters to identify each name’s contribution toward the final classification’s accuracy, and based its predictions on the most relevant of the bunch.
To evaluate the three networks’ performance, the researchers established a benchmark that used hard-coded rules to factor in on-screen data. Given a command like “Play Harry Potter,” it might estimate a 50 percent and 10 percent probability it refers to the audiobook and soundtrack, respectively.
In the end, when evaluated with four different data sets (slots with and without screen information and intents with and without screen information), all three of the AI models that considered on-screen data “consistently outperform[ed]” both the benchmark and a voice-only test set. More importantly, they didn’t exhibit degraded accuracy when trained exclusively on speech inputs.
“[We] verified that the contextual awareness of our models does not cause a degradation of non-contextual functionality,” Naik and team wrote. “Our approach is naturally extensible to new visual use cases, without requiring manual rule writing.”
In future research, they hope to explore additional context cues and extend visual features to encode screen object locations for multiple object types displayed on-screen (for example, books and movies).
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Amazon scientist explains how Alexa resolves ambiguous requests

Nitin Naresh

Read Next

Incorporate A Company In Kyrgyzstan In 2025

Incorporate A Company In Bulgaria In 2025

Incorporate A Company In Bosnia and Herzegovina In 2025

War In Different Parts Of The World : How Does It Affect The Environment ?

The Adani Scam : Inside The $250 Million Bribery Scheme!

Incorporate A Company In Turkmenistan In 2025

Incorporate A Company In Andorra In 2025

CCI Imposes ₹213 Crore Fine On Meta Over WhatsApp Privacy Policy!

Incorporate A Company In Hungary In 2025

Lokpal Seeks Response From SEBI Chairman Madhabi Puri Buch On The Hindenburg Report. Is India’s Financial Regulator Safeguarding Markets Or Protecting Insiders?

Incorporate A Company In Kyrgyzstan In 2025

Incorporate A Company In Bulgaria In 2025

Incorporate A Company In Bosnia and Herzegovina In 2025

War In Different Parts Of The World : How Does It Affect The Environment ?

The Adani Scam : Inside The $250 Million Bribery Scheme!

Incorporate A Company In Turkmenistan In 2025

Incorporate A Company In Andorra In 2025

CCI Imposes ₹213 Crore Fine On Meta Over WhatsApp Privacy Policy!

Incorporate A Company In Hungary In 2025

Lokpal Seeks Response From SEBI Chairman Madhabi Puri Buch On The Hindenburg Report. Is India’s Financial Regulator Safeguarding Markets Or Protecting Insiders?

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

Incorporate A Company In Kyrgyzstan In 2025

Incorporate A Company In Bulgaria In 2025

Incorporate A Company In Bosnia and Herzegovina In 2025

War In Different Parts Of The World : How Does It Affect The Environment ?

The Adani Scam : Inside The $250 Million Bribery Scheme!

Incorporate A Company In Turkmenistan In 2025

Incorporate A Company In Andorra In 2025

CCI Imposes ₹213 Crore Fine On Meta Over WhatsApp Privacy Policy!

Incorporate A Company In Hungary In 2025

Lokpal Seeks Response From SEBI Chairman Madhabi Puri Buch On The Hindenburg Report. Is India’s Financial Regulator Safeguarding Markets Or Protecting Insiders?

Subscribe to our mailing list to get the new updates!

GOeureka uses blockchain to unlock 400,000 hotel rooms with zero commission

Spotify ends test that required family plan subscribers to share their GPS location

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future