Image compression with recurrent neural networks

    公开(公告)号:US10192327B1

    公开(公告)日:2019-01-29

    申请号:US15424711

    申请日:2017-02-03

    Applicant: Google LLC

    Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.

    Self-supervised speech representations for fake audio detection

    公开(公告)号:US12198718B2

    公开(公告)日:2025-01-14

    申请号:US18446623

    申请日:2023-08-09

    Applicant: Google LLC

    Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

    Image compression with recurrent neural networks

    公开(公告)号:US10713818B1

    公开(公告)日:2020-07-14

    申请号:US16259207

    申请日:2019-01-28

    Applicant: Google LLC

    Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.

    Self-Supervised Speech Representations for Fake Audio Detection

    公开(公告)号:US20220172739A1

    公开(公告)日:2022-06-02

    申请号:US17110278

    申请日:2020-12-02

    Applicant: Google LLC

    Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

    Method for Detecting and Classifying Coughs or Other Non-Semantic Sounds Using Audio Feature Set Learned from Speech

    公开(公告)号:US20220130415A1

    公开(公告)日:2022-04-28

    申请号:US17507461

    申请日:2021-10-21

    Applicant: Google LLC

    Abstract: A method of detecting a cough in an audio stream includes a step of performing one or more pre-processing steps on the audio stream to generate an input audio sequence comprising a plurality of time-separated audio segments. An embedding is generated by a self-supervised triplet loss embedding model for each of the segments of the input audio sequence using an audio feature set, the embedding model having been trained to learn the audio feature set in a self-supervised triplet loss manner from a plurality of speech audio clips from a speech dataset. The embedding for each of the segments is provided to a model performing cough detection inference. This model generates a probability that each of the segments of the input audio sequence includes a cough episode. The method includes generating cough metrics for each of the cough episodes detected in the input audio sequence.

Patent Agency Ranking