Patent search ap:("Google LLC") AND inv:"Alanna Foster Slocum" Page 1

1.

发明授权
Self-supervised speech representations for fake audio detection 有权

公开(公告)号：US12198718B2

公开(公告)日：2025-01-14

申请号：US18446623

申请日：2023-08-09

Applicant: Google LLC

Inventor： Joel Shor , Alanna Foster Slocum

IPC: G10L25/69 , G10L15/02 , G10L15/06 , G10L15/22

Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

2.

发明公开
SELF-SUPERVISED SPEECH REPRESENTATIONS FOR FAKE AUDIO DETECTION 审中-公开

公开(公告)号：US20230386506A1

公开(公告)日：2023-11-30

申请号：US18446623

申请日：2023-08-09

Applicant: Google LLC

Inventor： Joel Shor , Alanna Foster Slocum

IPC: G10L25/69 , G10L15/02 , G10L15/06 , G10L15/22

CPC classification number: G10L25/69 , G10L15/02 , G10L15/063 , G10L15/22

Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

3.

发明授权
Self-supervised speech representations for fake audio detection 有权

公开(公告)号：US11756572B2

公开(公告)日：2023-09-12

申请号：US17110278

申请日：2020-12-02

Applicant: Google LLC

Inventor： Joel Shor , Alanna Foster Slocum

IPC: G10L25/69 , G10L15/02 , G10L15/06 , G10L15/22

CPC classification number: G10L25/69 , G10L15/02 , G10L15/063 , G10L15/22

Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

4.

发明公开
Speaker Verification with Multitask Speech Models 审中-公开

公开(公告)号：US20230260521A1

公开(公告)日：2023-08-17

申请号：US18167815

申请日：2023-02-10

Applicant: Google LLC

Inventor： Alanna Foster Slocum , Yiling Huang , Shelly Bensal , Quan Wang

IPC: G10L17/18 , G10L17/04 , G10L17/06

CPC classification number: G10L17/18 , G10L17/04 , G10L17/06

Abstract: A method includes obtaining a speaker identification (SID) model trained to predict speaker embeddings from utterances spoken by different speakers, the SID model includes a trained audio encoder and a trained SID head. The method also includes receiving a plurality of synthetic speech detection (SSD) training utterances that include a set of human-originated speech samples and a set of synthetic speech samples. The method also includes training, using the trained audio encoder, a SSD head on the SSD training utterances to learn to detect the presence of synthetic speech in audio encodings encoded by the trained audio encoder. The operations also include providing, for execution on a computing device, a multitask neural network model for performing both SID tasks and SSD tasks on input audio data in parallel.

Patent Agency Ranking