Invention Publication
- Patent Title: LANGUAGE-AGNOSTIC MULTILINGUAL MODELING USING EFFECTIVE SCRIPT NORMALIZATION
-
Application No.: EP24162746.2Application Date: 2021-01-19
-
Publication No.: EP4361897A3Publication Date: 2024-07-17
- Inventor: DATTA, Arindrima , RAMABHADRAN, Bhuvana , EMOND, Jesse , ROAK, Brian
- Applicant: GOOGLE LLC
- Applicant Address: US Mountain View CA 94043 1600 Amphitheatre Parkway
- Assignee: GOOGLE LLC
- Current Assignee: GOOGLE LLC
- Current Assignee Address: US Mountain View CA 94043 1600 Amphitheatre Parkway
- Agency: Marks & Clerk GST
- Priority: US 2062966779P 2020.01.28
- The original application number of the division: 21705040.0 2021.01.19
- Main IPC: G10L15/06
- IPC: G10L15/06 ; G06N3/044 ; G06N3/084 ; G10L15/16 ; G06F40/129 ; G06F40/53
Abstract:
A method (600) includes obtaining a plurality of training data sets (202) each associated with a respective native language and includes a plurality of respective training data samples (204). For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text (121) representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio (210) in the respective native language to generate a respective normalized training data sample (240). The method also includes training, using the normalized training data samples, a multilingual model (300) to predict speech recognition results (120) in the target script for corresponding speech utterances (106) spoken in any of the different native languages associated with the plurality of training data sets.
Public/Granted literature
- EP4361897A2 LANGUAGE-AGNOSTIC MULTILINGUAL MODELING USING EFFECTIVE SCRIPT NORMALIZATION Public/Granted day:2024-05-01
Information query