Self-supervised speech representations for fake audio detection

    公开(公告)号:US12198718B2

    公开(公告)日:2025-01-14

    申请号:US18446623

    申请日:2023-08-09

    Applicant: Google LLC

    Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

    SELF-SUPERVISED SPEECH REPRESENTATIONS FOR FAKE AUDIO DETECTION

    公开(公告)号:US20230386506A1

    公开(公告)日:2023-11-30

    申请号:US18446623

    申请日:2023-08-09

    Applicant: Google LLC

    CPC classification number: G10L25/69 G10L15/02 G10L15/063 G10L15/22

    Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

    Self-supervised speech representations for fake audio detection

    公开(公告)号:US11756572B2

    公开(公告)日:2023-09-12

    申请号:US17110278

    申请日:2020-12-02

    Applicant: Google LLC

    CPC classification number: G10L25/69 G10L15/02 G10L15/063 G10L15/22

    Abstract: A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.

    Speaker Verification with Multitask Speech Models

    公开(公告)号:US20230260521A1

    公开(公告)日:2023-08-17

    申请号:US18167815

    申请日:2023-02-10

    Applicant: Google LLC

    CPC classification number: G10L17/18 G10L17/04 G10L17/06

    Abstract: A method includes obtaining a speaker identification (SID) model trained to predict speaker embeddings from utterances spoken by different speakers, the SID model includes a trained audio encoder and a trained SID head. The method also includes receiving a plurality of synthetic speech detection (SSD) training utterances that include a set of human-originated speech samples and a set of synthetic speech samples. The method also includes training, using the trained audio encoder, a SSD head on the SSD training utterances to learn to detect the presence of synthetic speech in audio encodings encoded by the trained audio encoder. The operations also include providing, for execution on a computing device, a multitask neural network model for performing both SID tasks and SSD tasks on input audio data in parallel.

Patent Agency Ranking