Speaker Embeddings for Improved Automatic Speech Recognition

    公开(公告)号:US20230360632A1

    公开(公告)日:2023-11-09

    申请号:US17661832

    申请日:2022-05-03

    Applicant: Google LLC

    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

    SPEAKER EMBEDDINGS FOR IMPROVED AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20250037700A1

    公开(公告)日:2025-01-30

    申请号:US18919366

    申请日:2024-10-17

    Applicant: Google LLC

    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

    Speaker embeddings for improved automatic speech recognition

    公开(公告)号:US12136410B2

    公开(公告)日:2024-11-05

    申请号:US17661832

    申请日:2022-05-03

    Applicant: Google LLC

    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

Patent Agency Ranking