Google Duplex

DeepakSanaka 2,238 views 21 slides May 01, 2019
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Google Duplex is the technology that gives Google Assistant the ability to make phones calls and sound 'human'. It allows certain users to make a restaurant reservation by phone, but instead of the user speaking directly to the restaurant employee, Google Duplex, with the help of Google Assi...


Slide Content

Google Duplex By Deepak Sanaka

Contents Introduction Abstract Context about Google Duplex Architecture DNNs and RNNs Closed domains and Vanishing gradient problem Process Flow

Introduction A long-standing goal of human-computer interaction has been to enable people to have a natural conversation with computers, as they would with each other. In recent years, we have witnessed a revolution in the ability of computers to understand and to generate natural speech, especially with the application of deep neural networks (e.g., Google voice search , WaveNet ).

Abstract Google Duplex, It is a new technology for conducting natural conversations to carry out “real world” tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.

Defining a natural conversation A natural conversation can be described with the following characteristics: Speaker is exhibiting goal-directed, cooperative, rational behavior. Speaker is using the appropriate tone. Speaker can understand and control the conversational flow and use the right timin g.

What is Google Duplex? Google Duplex is an artificial intelligence (AI) chat agent that can carry out specific verbal tasks, such as making a reservation or appointment, over the phone. It works to conduct natural conversations to accomplish certain types of tasks.

Closed domain operation Google Duplex is not able to carry out random casual conversation. Rather, it was trained to autonomously handle three specific types of tasks: Scheduling a hair salon appointment, Making a restaurant reservation, and Asking about the business hours of a store.

How does Google Duplex model natural conversations? Duplex uses a deep neural network (DNN); in more complex cases, it makes use of a recurrent neural network (RNN) which is more expensive, but better at modeling language. At the core of Duplex is a recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX).

Architecture Incoming sound is processed through an Automatic Speech Recognition (ASR) system. This produces text that is analyzed with context data and other inputs to produce a response text that is read aloud through the Text-to-Speech (TTS) system.

Deep Neural Networks (DNNs) DNNs involve an input layer, a hidden layer (the matrix of weights which is trained against data), and an output layer capable of producing what can be interpreted as a prediction or a classification.

Recurrent Neural Networks (RNNs) RNNs not only ingest the current input, they also ingest their past hidden state as well. This allows for them to learn sequential patterns. “Rolled up” RNN “Unrolled” RNN

DNNs versus RNNs DNNs are good at one-shot prediction—if a single observation is all it takes to produce suitable output. However, oftentimes, data comes in sequences, esp. for a language it arrives in a specific sequence. It’s for this reason that RNNs are used. Since it is very important to remember the context when conducting a longer human-like conversation, RNNs became one of the obvious, go-to choices to do the job.

Why closed domain operation is important? Closed domains are loosely defined as any setting that has a limited number of conceivable interactions. Any closed domain has a sort of closed (and well-worn) number of conversational paths and options. When a domain is closed, conversations are pigeonholed—the same sorts of conversations occur over and over, building up a stronger dataset for harder-to-reach features such as natural timing, knowing industry/trade slang, and so on.

Advantages of closed domain operation It has a number of advantages, but a major one is that it helps Duplex avoid the “ vanishing gradient problem ,” which is an issue for many DNNs and RNNs alike. It increases the sample size for particular conversational paths in Duplex’s training data.

Vanishing Gradient Problem When many hidden layers are stacked such as in a multi-layer DNN or between time steps in an RNN, the network begins to “forget” the past. As the network goes through multiple layers of words, the original context gets lost, so it fails to capture the relationship between the words that stand far apart in a conversation. This happens due to the underlying mechanics of backpropagation.

Illustration of vanishing gradients Given a closed domain, the number of times one has to look into the past is constrained. Vanishing gradients aren’t as much of an issue if you don’t need to remember much.

Understanding Nuances When many hidden layers are stacked such as in a multi-layer DNN or between time steps in an RNN, the network begins to “forget” the past. In the above example, we can see how the meaning of “OK for 4” changes in different contexts.

Process Flow

Conclusion Allowing people to interact with technology as naturally as they interact with each other has been a long standing promise. Google Duplex takes a step in this direction, making interaction with technology via natural conversation a reality in specific scenarios.

References https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html https://willowtreeapps.com/ideas/an-introduction-to-google-duplex-and-natural-conversations

Thank You