Inverted Projection for Robust Speech Translation

    公开(公告)号:US20230021824A1

    公开(公告)日:2023-01-26

    申请号:US17859146

    申请日:2022-07-07

    Applicant: Google LLC

    Abstract: The technology provides an approach to train translation models that are robust to transcription errors and punctuation errors. The approach includes introducing errors from actual automatic speech recognition and automatic punctuation systems into the source side of the machine translation training data. A method for training a machine translation model includes performing automatic speech recognition on input source audio to generate a system transcript. The method aligns a human transcript of the source audio to the system transcript, including projecting system segmentation onto the human transcript. Then the method performs segment robustness training of a machine translation model according to the aligned human and system transcripts, and performs system robustness training of the machine translation model, e.g., by injecting token errors into training data.

    STABLE REAL-TIME TRANSLATIONS OF AUDIO STREAMS

    公开(公告)号:US20240265215A1

    公开(公告)日:2024-08-08

    申请号:US18617428

    申请日:2024-03-26

    Applicant: Google LLC

    CPC classification number: G06F40/58 G10L15/005 G10L15/063 G10L15/197 G10L15/22

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that facilitate generating stable real-time textual translations in a target language of an input audio data stream that is recorded in a source language. An audio stream that is recorded in a first language is obtained. A partial transcription of the audio can be generated at each time interval in a plurality of successive time intervals. Each partial transcription can be translated into a second language that is different from the first language. Each translated partial transcription can be input to a model that determines whether a portion of an input translated partial transcription is stable. Based on the input translated partial transcription, the model identifies a portion of the translated partial transcription that is predicted to be stable. This stable portion of the translated partial transcription is provided for display on a user device.

    Stable real-time translations of audio streams

    公开(公告)号:US11972226B2

    公开(公告)日:2024-04-30

    申请号:US17269800

    申请日:2020-03-23

    Applicant: Google LLC

    CPC classification number: G06F40/58 G10L15/005 G10L15/063 G10L15/197 G10L15/22

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that facilitate generating stable real-time textual translations in a target language of an input audio data stream that is recorded in a source language. An audio stream that is recorded in a first language is obtained. A partial transcription of the audio can be generated at each time interval in a plurality of successive time intervals. Each partial transcription can be translated into a second language that is different from the first language. Each translated partial transcription can be input to a model that determines whether a portion of an input translated partial transcription is stable. Based on the input translated partial transcription, the model identifies a portion of the translated partial transcription that is predicted to be stable. This stable portion of the translated partial transcription is provided for display on a user device.

    Speaker Embeddings for Improved Automatic Speech Recognition

    公开(公告)号:US20230360632A1

    公开(公告)日:2023-11-09

    申请号:US17661832

    申请日:2022-05-03

    Applicant: Google LLC

    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

    SPEAKER EMBEDDINGS FOR IMPROVED AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20250037700A1

    公开(公告)日:2025-01-30

    申请号:US18919366

    申请日:2024-10-17

    Applicant: Google LLC

    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

    Speaker embeddings for improved automatic speech recognition

    公开(公告)号:US12136410B2

    公开(公告)日:2024-11-05

    申请号:US17661832

    申请日:2022-05-03

    Applicant: Google LLC

    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

    STABLE REAL-TIME TRANSLATIONS OF AUDIO STREAMS

    公开(公告)号:US20220121827A1

    公开(公告)日:2022-04-21

    申请号:US17269800

    申请日:2020-03-23

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that facilitate generating stable real-time textual translations in a target language of an input audio data stream that is recorded in a source language. An audio stream that is recorded in a first language is obtained. A partial transcription of the audio can be generated at each time interval in a plurality of successive time intervals. Each partial transcription can be translated into a second language that is different from the first language. Each translated partial transcription can be input to a model that determines whether a portion of an input translated partial transcription is stable. Based on the input translated partial transcription, the model identifies a portion of the translated partial transcription that is predicted to be stable. This stable portion of the translated partial transcription is provided for display on a user device.

Patent Agency Ranking