-
公开(公告)号:US20240221750A1
公开(公告)日:2024-07-04
申请号:US18610233
申请日:2024-03-19
Applicant: Google LLC
Inventor: Wei Li , Rohit Prakash Prabhavalkar , Kanury Kanishka Rao , Yanzhang He , Ian C. McGraw , Anton Bakhtin
CPC classification number: G10L15/22 , G10L15/02 , G10L15/063 , G10L15/18 , G10L19/00 , G10L2015/025 , G10L2015/088 , G10L15/142 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
-
公开(公告)号:US11948062B2
公开(公告)日:2024-04-02
申请号:US17112966
申请日:2020-12-04
Applicant: Google LLC
Inventor: Ouais Alsharif , Rohit Prakash Prabhavalkar , Ian C. McGraw , Antoine Jean Bruguier
CPC classification number: G06N3/044 , G06N3/049 , G06N3/08 , G06N20/00 , G05B2219/33025 , G05B2219/40326 , G06F17/16 , G06N3/04 , G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.
-
公开(公告)号:US20230169984A1
公开(公告)日:2023-06-01
申请号:US18103324
申请日:2023-01-30
Applicant: Google LLC
Inventor: Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw
IPC: G10L17/24 , G10L17/06 , G10L21/028
CPC classification number: G10L17/24 , G10L17/06 , G10L21/028
Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.
-
公开(公告)号:US20220335953A1
公开(公告)日:2022-10-20
申请号:US17233253
申请日:2021-04-16
Applicant: Google LLC
Inventor: Rajeev Rikhye , Quan Wang , Yanzhang He , Qiao Liang , Ian C. McGraw
IPC: G10L17/24 , G10L21/028 , G10L17/06
Abstract: Techniques disclosed herein are directed towards streaming keyphrase detection which can be customized to detect one or more particular keyphrases, without requiring retraining of any model(s) for those particular keyphrase(s). Many implementations include processing audio data using a speaker separation model to generate separated audio data which isolates an utterance spoken by a human speaker from one or more additional sounds not spoken by the human speaker, and processing the separated audio data using a text independent speaker identification model to determine whether a verified and/or registered user spoke a spoken utterance captured in the audio data. Various implementations include processing the audio data and/or the separated audio data using an automatic speech recognition model to generate a text representation of the utterance. Additionally or alternatively, the text representation of the utterance can be processed to determine whether at least a portion of the text representation of the utterance captures a particular keyphrase. When the system determines the registered and/or verified user spoke the utterance and the system determines the text representation of the utterance captures the particular keyphrase, the system can cause a computing device to perform one or more actions corresponding to the particular keyphrase.
-
公开(公告)号:US20210089916A1
公开(公告)日:2021-03-25
申请号:US17112966
申请日:2020-12-04
Applicant: Google LLC
Inventor: Ouais Alsharif , Rohit Prakash Prabhavalkar , Ian C. McGraw , Antoine Jean Bruguier
IPC: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.
-
-
-
-