-
公开(公告)号:US10192327B1
公开(公告)日:2019-01-29
申请号:US15424711
申请日:2017-02-03
Applicant: Google LLC
Inventor: George Dan Toderici , Sean O'Malley , Rahul Sukthankar , Sung Jin Hwang , Damien Vincent , Nicholas Johnston , David Charles Minnen , Joel Shor , Michele Covell
Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.
-
公开(公告)号:US12198718B2
公开(公告)日:2025-01-14
申请号:US18446623
申请日:2023-08-09
Applicant: Google LLC
Inventor: Joel Shor , Alanna Foster Slocum
Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.
-
公开(公告)号:US11862188B2
公开(公告)日:2024-01-02
申请号:US17507461
申请日:2021-10-21
Applicant: Google LLC
Inventor: Jacob Garrison , Jacob Scott Peplinski , Joel Shor
IPC: G10L25/66 , G10L15/02 , G10L15/06 , G10L15/04 , A61B5/00 , G16H40/67 , A61B5/08 , G10L25/78 , G10L25/51 , G10L25/30
CPC classification number: G10L25/66 , A61B5/0823 , A61B5/4803 , A61B5/7267 , A61B5/7282 , G10L15/02 , G10L15/04 , G10L15/063 , G10L25/30 , G10L25/51 , G10L25/78 , G16H40/67
Abstract: A method of detecting a cough in an audio stream includes a step of performing one or more pre-processing steps on the audio stream to generate an input audio sequence comprising a plurality of time-separated audio segments. An embedding is generated by a self-supervised triplet loss embedding model for each of the segments of the input audio sequence using an audio feature set, the embedding model having been trained to learn the audio feature set in a self-supervised triplet loss manner from a plurality of speech audio clips from a speech dataset. The embedding for each of the segments is provided to a model performing cough detection inference. This model generates a probability that each of the segments of the input audio sequence includes a cough episode. The method includes generating cough metrics for each of the cough episodes detected in the input audio sequence.
-
公开(公告)号:US10713818B1
公开(公告)日:2020-07-14
申请号:US16259207
申请日:2019-01-28
Applicant: Google LLC
Inventor: George Dan Toderici , Sean O'Malley , Rahul Sukthankar , Sung Jin Hwang , Damien Vincent , Nicholas Johnston , David Charles Minnen , Joel Shor , Michele Covell
Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.
-
公开(公告)号:US12249346B2
公开(公告)日:2025-03-11
申请号:US18509722
申请日:2023-11-15
Applicant: Google LLC
Inventor: Jacob Garrison , Jacob Scott Peplinski , Joel Shor
IPC: G10L25/66 , A61B5/00 , A61B5/08 , G10L15/02 , G10L15/04 , G10L15/06 , G10L25/30 , G10L25/51 , G10L25/78 , G16H40/67
Abstract: A method of detecting a cough in an audio stream includes a step of performing one or more pre-processing steps on the audio stream to generate an input audio sequence comprising a plurality of time-separated audio segments. An embedding is generated by a self-supervised triplet loss embedding model for each of the segments of the input audio sequence using an audio feature set, the embedding model having been trained to learn the audio feature set in a self-supervised triplet loss manner from a plurality of speech audio clips from a speech dataset. The embedding for each of the segments is provided to a model performing cough detection inference. This model generates a probability that each of the segments of the input audio sequence includes a cough episode. The method includes generating cough metrics for each of the cough episodes detected in the input audio sequence.
-
公开(公告)号:US20220172739A1
公开(公告)日:2022-06-02
申请号:US17110278
申请日:2020-12-02
Applicant: Google LLC
Inventor: Joel Shor , Joshua Foster Slocum
Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.
-
公开(公告)号:US20220130415A1
公开(公告)日:2022-04-28
申请号:US17507461
申请日:2021-10-21
Applicant: Google LLC
Inventor: Jacob Garrison , Jacob Scott Peplinski , Joel Shor
Abstract: A method of detecting a cough in an audio stream includes a step of performing one or more pre-processing steps on the audio stream to generate an input audio sequence comprising a plurality of time-separated audio segments. An embedding is generated by a self-supervised triplet loss embedding model for each of the segments of the input audio sequence using an audio feature set, the embedding model having been trained to learn the audio feature set in a self-supervised triplet loss manner from a plurality of speech audio clips from a speech dataset. The embedding for each of the segments is provided to a model performing cough detection inference. This model generates a probability that each of the segments of the input audio sequence includes a cough episode. The method includes generating cough metrics for each of the cough episodes detected in the input audio sequence.
-
公开(公告)号:US20220059117A1
公开(公告)日:2022-02-24
申请号:US17000583
申请日:2020-08-24
Applicant: Google LLC
Inventor: Joel Shor , Ronnie Maor , Oran Lang , Omry Tuval , Marco Tagliasacchi , Ira Shavitt , Felix de Chaumont Quitry , Dotan Emanuel , Aren Jansen
Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.
-
公开(公告)号:US11996116B2
公开(公告)日:2024-05-28
申请号:US17000583
申请日:2020-08-24
Applicant: Google LLC
Inventor: Joel Shor , Ronnie Maor , Oran Lang , Omry Tuval , Marco Tagliasacchi , Ira Shavitt , Felix de Chaumont Quitry , Dotan Emanuel , Aren Jansen
Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.
-
10.
公开(公告)号:US20240161769A1
公开(公告)日:2024-05-16
申请号:US18509722
申请日:2023-11-15
Applicant: Google LLC
Inventor: Jacob Garrison , Jacob Scott Peplinski , Joel Shor
IPC: G10L25/66 , A61B5/00 , A61B5/08 , G10L15/02 , G10L15/04 , G10L15/06 , G10L25/30 , G10L25/51 , G10L25/78 , G16H40/67
CPC classification number: G10L25/66 , A61B5/0823 , A61B5/4803 , A61B5/7267 , A61B5/7282 , G10L15/02 , G10L15/04 , G10L15/063 , G10L25/30 , G10L25/51 , G10L25/78 , G16H40/67
Abstract: A method of detecting a cough in an audio stream includes a step of performing one or more pre-processing steps on the audio stream to generate an input audio sequence comprising a plurality of time-separated audio segments. An embedding is generated by a self-supervised triplet loss embedding model for each of the segments of the input audio sequence using an audio feature set, the embedding model having been trained to learn the audio feature set in a self-supervised triplet loss manner from a plurality of speech audio clips from a speech dataset. The embedding for each of the segments is provided to a model performing cough detection inference. This model generates a probability that each of the segments of the input audio sequence includes a cough episode. The method includes generating cough metrics for each of the cough episodes detected in the input audio sequence.
-
-
-
-
-
-
-
-
-