PROPER NOUN RECOGNITION IN END-TO-END SPEECH RECOGNITION

    公开(公告)号:US20230377564A1

    公开(公告)日:2023-11-23

    申请号:US18362273

    申请日:2023-07-31

    Applicant: Google LLC

    Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

    Contextual Biasing for Speech Recognition
    13.
    发明公开

    公开(公告)号:US20230274736A1

    公开(公告)日:2023-08-31

    申请号:US18311964

    申请日:2023-05-04

    Applicant: Google LLC

    CPC classification number: G10L15/187 G06N20/10 G10L19/04 G10L2015/088

    Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

    Enhancing audio using multiple recording devices

    公开(公告)号:US11443769B2

    公开(公告)日:2022-09-13

    申请号:US17194827

    申请日:2021-03-08

    Applicant: Google LLC

    Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

    CONTEXTUAL BIASING FOR SPEECH RECOGNITION

    公开(公告)号:US20220366897A1

    公开(公告)日:2022-11-17

    申请号:US17815049

    申请日:2022-07-26

    Applicant: Google LLC

    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

    Enhancing audio using multiple recording devices

    公开(公告)号:US10943619B2

    公开(公告)日:2021-03-09

    申请号:US16812760

    申请日:2020-03-09

    Applicant: Google LLC

    Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

Patent Agency Ranking