-
公开(公告)号:US20230335117A1
公开(公告)日:2023-10-19
申请号:US18186872
申请日:2023-03-20
申请人: Google LLC
发明人: Shuo-yiin Chang , Guru Prakash Arumugam , Zelin Wu , Tara N. Sainath , Bo LI , Qiao Liang , Adam Stambler , Shyam Upadhyay , Manaal Faruqui , Trevor Strohman
CPC分类号: G10L15/16 , G10L15/22 , G10L15/063 , G10L2015/223
摘要: A method includes receiving, as input to a speech recognition model, audio data corresponding to a spoken utterance. The method also includes performing, using the speech recognition model, speech recognition on the audio data by, at each of a plurality of time steps, encoding, using an audio encoder, the audio data corresponding to the spoken utterance into a corresponding audio encoding, and decoding, using a speech recognition joint network, the corresponding audio encoding into a probability distribution over possible output labels. At each of the plurality of time steps, the method also includes determining, using an intended query (IQ) joint network configured to receive a label history representation associated with a sequence of non-blank symbols output by a final softmax layer, an intended query decision indicating whether or not the spoken utterance includes a query intended for a digital assistant.
-
公开(公告)号:US11545147B2
公开(公告)日:2023-01-03
申请号:US16401349
申请日:2019-05-02
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media for classification using neural networks. One method includes receiving audio data corresponding to an utterance. Obtaining a transcription of the utterance. Generating a representation of the audio data. Generating a representation of the transcription of the utterance. Providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of the audio data and a given representation of the transcription of the utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistance or is likely not directed to an automated assistant. Receiving, from the classifier, an indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant. Selectively instructing the automated assistant based at least on the indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant.
-
公开(公告)号:US20190304459A1
公开(公告)日:2019-10-03
申请号:US16401349
申请日:2019-05-02
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media for classification using neural networks. One method includes receiving audio data corresponding to an utterance. Obtaining a transcription of the utterance. Generating a representation of the audio data. Generating a representation of the transcription of the utterance. Providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of the audio data and a given representation of the transcription of the utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistance or is likely not directed to an automated assistant. Receiving, from the classifier, an indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant. Selectively instructing the automated assistant based at least on the indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant.
-
公开(公告)号:US20240304181A1
公开(公告)日:2024-09-12
申请号:US18598523
申请日:2024-03-07
申请人: Google LLC
发明人: Guru Prakash Arumugam , Shuo-yiin Chang , Shaan Jagdeep Patrick Bijwadia , Weiran Wang , Quan Wang , Rohit Prakash Prabhavalkar , Tara N. Sainath
IPC分类号: G10L15/06
CPC分类号: G10L15/063
摘要: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.
-
公开(公告)号:US20220293101A1
公开(公告)日:2022-09-15
申请号:US17804657
申请日:2022-05-31
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: A method includes receiving a spoken utterance that includes a plurality of words, and generating, using a neural network-based utterance classifier comprising a stack of multiple Long-Short Term Memory (LSTM) layers, a respective textual representation for each word of the of the plurality of words of the spoken utterance. The neural network-based utterance classifier trained on negative training examples of spoken utterances not directed toward an automated assistant server. The method further including determining, using the respective textual representation generated for each word of the plurality of words of the spoken utterance, that the spoken utterance is one of directed toward the automated assistant server or not directed toward the automated assistant server, and when the spoken utterance is directed toward the automated assistant server, generating instructions that cause the automated assistant server to generate a response to the spoken utterance.
-
公开(公告)号:US10311872B2
公开(公告)日:2019-06-04
申请号:US15659016
申请日:2017-07-25
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media for classification using neural networks. One method includes receiving audio data corresponding to an utterance. Obtaining a transcription of the utterance. Generating a representation of the audio data. Generating a representation of the transcription of the utterance. Providing (i) the representation of the audio data and (ii) the representation of the transcription of the utterance to a classifier that, based on a given representation of the audio data and a given representation of the transcription of the utterance, is trained to output an indication of whether the utterance associated with the given representation is likely directed to an automated assistance or is likely not directed to an automated assistant. Receiving, from the classifier, an indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant. Selectively instructing the automated assistant based at least on the indication of whether the utterance corresponding to the received audio data is likely directed to the automated assistant or is likely not directed to the automated assistant.
-
公开(公告)号:US20240096326A1
公开(公告)日:2024-03-21
申请号:US18526991
申请日:2023-12-01
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: A method includes receiving a spoken utterance that includes a plurality of words, and generating, using a neural network-based utterance classifier comprising a stack of multiple Long-Short Term Memory (LSTM) layers, a respective textual representation for each word of the of the plurality of words of the spoken utterance. The neural network-based utterance classifier trained on negative training examples of spoken utterances not directed toward an automated assistant server. The method further including determining, using the respective textual representation generated for each word of the plurality of words of the spoken utterance, that the spoken utterance is one of directed toward the automated assistant server or not directed toward the automated assistant server, and when the spoken utterance is directed toward the automated assistant server, generating instructions that cause the automated assistant server to generate a response to the spoken utterance.
-
公开(公告)号:US11848018B2
公开(公告)日:2023-12-19
申请号:US17804657
申请日:2022-05-31
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
CPC分类号: G10L15/22 , G06F3/167 , G10L15/16 , G10L15/18 , G10L15/30 , G10L17/00 , G10L2015/223 , G10L2015/227
摘要: A method includes receiving a spoken utterance that includes a plurality of words, and generating, using a neural network-based utterance classifier comprising a stack of multiple Long-Short Term Memory (LSTM) layers, a respective textual representation for each word of the of the plurality of words of the spoken utterance. The neural network-based utterance classifier trained on negative training examples of spoken utterances not directed toward an automated assistant server. The method further including determining, using the respective textual representation generated for each word of the plurality of words of the spoken utterance, that the spoken utterance is one of directed toward the automated assistant server or not directed toward the automated assistant server, and when the spoken utterance is directed toward the automated assistant server, generating instructions that cause the automated assistant server to generate a response to the spoken utterance.
-
公开(公告)号:US11361768B2
公开(公告)日:2022-06-14
申请号:US16935112
申请日:2020-07-21
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: A method includes receiving a spoken utterance that includes a plurality of words, and generating, using a neural network-based utterance classifier comprising a stack of multiple Long-Short Term Memory (LSTM) layers, a respective textual representation for each word of the of the plurality of words of the spoken utterance. The neural network-based utterance classifier trained on negative training examples of spoken utterances not directed toward an automated assistant server. The method further including determining, using the respective textual representation generated for each word of the plurality of words of the spoken utterance, that the spoken utterance is one of directed toward the automated assistant server or not directed toward the automated assistant server, and when the spoken utterance is directed toward the automated assistant server, generating instructions that cause the automated assistant server to generate a response to the spoken utterance.
-
公开(公告)号:US20200349946A1
公开(公告)日:2020-11-05
申请号:US16935112
申请日:2020-07-21
申请人: Google LLC
发明人: Nathan David Howard , Gabor Simko , Maria Carolina Parada San Martin , Ramkarthik Kalyanasundaram , Guru Prakash Arumugam , Srinivas Vasudevan
摘要: A method includes receiving a spoken utterance that includes a plurality of words, and generating, using a neural network-based utterance classifier comprising a stack of multiple Long-Short Term Memory (LSTM) layers, a respective textual representation for each word of the of the plurality of words of the spoken utterance. The neural network-based utterance classifier trained on negative training examples of spoken utterances not directed toward an automated assistant server. The method further including determining, using the respective textual representation generated for each word of the plurality of words of the spoken utterance, that the spoken utterance is one of directed toward the automated assistant server or not directed toward the automated assistant server, and when the spoken utterance is directed toward the automated assistant server, generating instructions that cause the automated assistant server to generate a response to the spoken utterance.
-
-
-
-
-
-
-
-
-