-
公开(公告)号:US11682385B2
公开(公告)日:2023-06-20
申请号:US17348422
申请日:2021-06-15
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette
CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223
Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.
-
公开(公告)号:US10460735B2
公开(公告)日:2019-10-29
申请号:US16172221
申请日:2018-10-26
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Othar Hansson
IPC: G10L17/00 , G10L17/24 , G10L15/08 , G10L17/22 , G10L19/00 , G06F21/32 , H04L29/06 , H04W12/06 , G10L15/18 , G10L17/20
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
-
公开(公告)号:US10438593B2
公开(公告)日:2019-10-08
申请号:US14805753
申请日:2015-07-22
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara
IPC: G10L15/00 , G10L17/04 , G10L15/06 , G10L17/08 , G10L17/18 , G10L17/24 , G10L15/02 , G10L15/18 , G10L17/06 , G10L15/07 , G10L15/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining enrollment acoustic data representing an enrollment utterance spoken by a user, obtaining a set of candidate acoustic data representing utterances spoken by other users, determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, selecting a subset of candidate acoustic data from the set of candidate acoustic data based at least on the similarity scores, generating a detection model based on the subset of candidate acoustic data, and providing the detection model for use in detecting an utterance spoken by the user.
-
公开(公告)号:US20190074017A1
公开(公告)日:2019-03-07
申请号:US16172221
申请日:2018-10-26
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Othar Hansson
IPC: G10L17/24 , G06F21/32 , G10L17/00 , G10L17/20 , G10L15/08 , H04W12/06 , H04L29/06 , G10L19/00 , G10L17/22 , G10L15/18
CPC classification number: G10L17/24 , G06F21/32 , G06F2221/2111 , G10L15/08 , G10L15/18 , G10L17/00 , G10L17/20 , G10L17/22 , G10L19/00 , G10L2015/088 , H04L63/0861 , H04W12/06
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
-
公开(公告)号:US11967310B2
公开(公告)日:2024-04-23
申请号:US18322207
申请日:2023-05-23
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette
CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L2015/025 , G10L2015/088 , G10L2015/223
Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.
-
公开(公告)号:US20230162729A1
公开(公告)日:2023-05-25
申请号:US18151540
申请日:2023-01-09
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Hyun Jin Park
CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088
Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.
-
公开(公告)号:US20220319522A1
公开(公告)日:2022-10-06
申请号:US17221559
申请日:2021-04-02
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Othar Hansson
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
-
公开(公告)号:US20210312913A1
公开(公告)日:2021-10-07
申请号:US17348422
申请日:2021-06-15
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette
Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.
-
公开(公告)号:US10147429B2
公开(公告)日:2018-12-04
申请号:US15697052
申请日:2017-09-06
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Othar Hansson
IPC: G10L17/00 , G10L17/24 , G10L15/08 , G10L17/22 , G10L19/00 , G06F21/32 , H04L29/06 , H04W12/06 , G10L15/18 , G10L17/20
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
-
公开(公告)号:US20240177708A1
公开(公告)日:2024-05-30
申请号:US18432282
申请日:2024-02-05
Applicant: Google LLC
Inventor: Raziel Alvarez Guevara , Hyun Jin Park
CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223
Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.
-
-
-
-
-
-
-
-
-