PROPER NOUN RECOGNITION IN END-TO-END SPEECH RECOGNITION

    公开(公告)号:EP4375882A3

    公开(公告)日:2024-07-17

    申请号:EP24169022.1

    申请日:2021-01-15

    申请人: Google LLC

    摘要: A method (400) for training a speech recognition model (200) with a minimum word error rate loss function includes receiving a training example (302) including a proper noun and generating a plurality of hypotheses (222) corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty (332) to the minimum word error rate loss function.

    LANGUAGE-AGNOSTIC MULTILINGUAL MODELING USING EFFECTIVE SCRIPT NORMALIZATION

    公开(公告)号:EP4361897A3

    公开(公告)日:2024-07-17

    申请号:EP24162746.2

    申请日:2021-01-19

    申请人: GOOGLE LLC

    摘要: A method (600) includes obtaining a plurality of training data sets (202) each associated with a respective native language and includes a plurality of respective training data samples (204). For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text (121) representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio (210) in the respective native language to generate a respective normalized training data sample (240). The method also includes training, using the normalized training data samples, a multilingual model (300) to predict speech recognition results (120) in the target script for corresponding speech utterances (106) spoken in any of the different native languages associated with the plurality of training data sets.