-
公开(公告)号:US12198718B2
公开(公告)日:2025-01-14
申请号:US18446623
申请日:2023-08-09
Applicant: Google LLC
Inventor: Joel Shor , Alanna Foster Slocum
Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.
-
公开(公告)号:US20230386506A1
公开(公告)日:2023-11-30
申请号:US18446623
申请日:2023-08-09
Applicant: Google LLC
Inventor: Joel Shor , Alanna Foster Slocum
CPC classification number: G10L25/69 , G10L15/02 , G10L15/063 , G10L15/22
Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.
-
公开(公告)号:US11756572B2
公开(公告)日:2023-09-12
申请号:US17110278
申请日:2020-12-02
Applicant: Google LLC
Inventor: Joel Shor , Alanna Foster Slocum
CPC classification number: G10L25/69 , G10L15/02 , G10L15/063 , G10L15/22
Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.
-
公开(公告)号:US20230260521A1
公开(公告)日:2023-08-17
申请号:US18167815
申请日:2023-02-10
Applicant: Google LLC
Inventor: Alanna Foster Slocum , Yiling Huang , Shelly Bensal , Quan Wang
Abstract: A method includes obtaining a speaker identification (SID) model trained to predict speaker embeddings from utterances spoken by different speakers, the SID model includes a trained audio encoder and a trained SID head. The method also includes receiving a plurality of synthetic speech detection (SSD) training utterances that include a set of human-originated speech samples and a set of synthetic speech samples. The method also includes training, using the trained audio encoder, a SSD head on the SSD training utterances to learn to detect the presence of synthetic speech in audio encodings encoded by the trained audio encoder. The operations also include providing, for execution on a computing device, a multitask neural network model for performing both SID tasks and SSD tasks on input audio data in parallel.
-
-
-