TRAINING AND/OR USING A LANGUAGE SELECTION MODEL FOR AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE

    公开(公告)号:US20220328035A1

    公开(公告)日:2022-10-13

    申请号:US17846287

    申请日:2022-06-22

    Applicant: Google LLC

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

    Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance

    公开(公告)号:US11410641B2

    公开(公告)日:2022-08-09

    申请号:US16959037

    申请日:2019-11-27

    Applicant: Google LLC

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

    On-Device Multilingual Speech Recognition
    3.
    发明公开

    公开(公告)号:US20240331700A1

    公开(公告)日:2024-10-03

    申请号:US18191711

    申请日:2023-03-28

    Applicant: Google LLC

    CPC classification number: G10L15/26 G10L15/32

    Abstract: A method includes receiving a sequence of input audio frames and processing each corresponding input audio frame to determine a language ID event that indicates a predicted language. The method also includes obtaining speech recognition events each including a respective speech recognition result determined by a first language pack. Based on determining that the utterance includes a language switch from the first language to a second language, the method also includes loading a second language pack onto the client device and rewinding the input audio data buffered by an audio buffer to a time of the corresponding input audio frame associated with the language ID event that first indicated the second language as the predicted language. The method also includes emitting a first transcription and processing, using the second language pack loaded onto the client device, the rewound buffered audio data to generate a second transcription.

    Training and/or using a language selection model for automatically determining language for speech recognition of spoken utterance

    公开(公告)号:US11646011B2

    公开(公告)日:2023-05-09

    申请号:US17846287

    申请日:2022-06-22

    Applicant: Google LLC

    CPC classification number: G10L15/005

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

    TRAINING AND/OR USING A LANGUAGE SELECTION MODEL FOR AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE

    公开(公告)号:US20200335083A1

    公开(公告)日:2020-10-22

    申请号:US16959037

    申请日:2019-11-27

    Applicant: Google LLC

    Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.

Patent Agency Ranking