Amazon is gradually giving Alexa more AI – Fast Company

Posted: September 27, 2019 at 7:49 am

Amazon announced a large batch of new products on Wednesday, making it clear once again that it wants to spread its Alexa digital assistant into as many consumer tech categories as possiblenot just smart speakers, but everything from earbuds to eyeglasses to rings. But there was another storyline woven into the announcements in Seattle. More artificial intelligence, specifically natural language AI, is finding its way into Alexa and in more ways.

For starters, Amazon says its been using neural networks to make Alexas voice sound more human when it translates text (like your text messages) into speech. Rohit Prasad, who heads up Alexa machine learning and artificial intelligence, told me that this technology has allowed Amazon to take a totally different approach to generating speech.

In the past, Alexas algorithms broke down language into word parts or vocal sounds, then tried to string them together as smoothly as possible. But it always sounded somewhat choppy and robotic. Now, Amazon is using neural networks that can generate whole sentences of text in real time, says Prasad. This creates a vocal sound thats more fluid and more human-sounding. (Apples Siri and Googles Assistant have also achieved more natural voices recently through similar means.)

Its this same natural language modeling that will very soon give Alexa a completely different voices. Amazon says it will start with celebrities, with Samual L. Jackson being the first. Amazon will sell Jackson-as-Alexa as an add-on service starting later this year.

Amazons Jackson voice is at least partially driven by a natural language model. The model learns from Jacksons voicehe recorded a bunch of samples in a studioto generate a voice that mimics his distinctive tone while providing the answers and information the assistant would normally provide. But Amazon also curated a set of complete Jackson utterances for the assistant to use when the time is right.

Jackson will likely be just the first of many celebrity voices that Amazon will offer as alternatives to the standard Alexa voice. (Google, meanwhile, let the Google Assistant talk like John Legend early this year, also due to advances in using AI to synthesize voices.)

Amazon also added some machine learning tricks to its Ring doorbell cams. In a new service Amazon is calling Doorbell Concierge, the devices will soon be able to detect various kinds of people who show up at the front door unannounced. The demo I saw featured three kinds of visitorsa guy delivering a package, a Girl Scout selling cookies, and an unidentified man. The Ring engaged them all in a short dialogue to find out what they wanted, and a neural network in the background used what they said to determine what kind of a caller they were. It did this based only on what they said, not on camera imagery. The categorization then informed the Ring device what to say to each one. For instance, it told the delivery guy where to put the package, after asking if he needed a signature. And it asked the unidentifiable man if he would like to leave his contact information.

The Ring video doorbell. [Photo: courtesy of Ring]The new Concierge feature isnt quite ready for market yet. When its released, it will likely be able to recognize a small set of types of callers. But that set will probably grow.

Last year, Amazon expanded Alexas hearing to detect more than just human commands. As part of its Guard home security mode, the sensitive microphone array used in Echo speakers began listening for the sounds of glass breaking and smoke alarms going off when nobody was in a home. Now Amazon has added the ability to listen for human-related sounds in the home while Guard is set to its away mode. These include the sounds of footsteps, coughing, and doors closing when theres supposed to be no one home. Alexa can send an alert to a user if it detects one of these sounds.

In all these cases, a deep learning model is taking the audio input from the microphones and flagging potentially dangerous sounds. Amazon could train the assistant to listen for many other types of sounds. For example, Alexa devices could begin listening for the sounds of falls or labored breathing in places where elderly people live. Whether Amazon moves in this direction is anybodys guess, but the fact that the company is steadily adding things that Alexa can listen for is telling.

A relatively new area in natural language research is using neural networks to detect emotion through words and intonations. Amazon has been focusing on the sound of frustration in the voices of people talking to Alexa. When it detects frustration, Alexa may conclude that its given an answer the user didnt like and then search for another way to answer. Prasad said Amazon has its own set of labeled recordings of people sounding frustrated, which it uses to train the neural networks.

But its a hard problem. The assistant has to know how to react after detecting a frustrated person. And if it takes another stab at providing an answer, the assistant better be fairly certain that the second answer is useful. And there are times when the assistant has to say Sorry, I dont have the answer.

We are starting to experiment with these different ways of responding, and once this is launched, you will see many different flavors, Prasad said.

This kind of emotional awareness will likely start showing up in many kinds of assistants. Any assistant should be capable of knowing when its done something wrong and be able to open up a feedback loop in order to get better.

The frustration detection feature will likely show up in Alexa next year.

Read more from the original source:

Amazon is gradually giving Alexa more AI - Fast Company

Related Posts