专利检索 ap:("Dimitri Kanevsky" OR "James R. Kozloski" OR "Clifford A. Pickover" OR "Tara N. Sainath") AND inv:"Tara N. Sainath" 第 6 页

51.

发明申请
Large-Scale Language Model Data Selection for Rare-Word Speech Recognition 有权

公开(公告)号：US20230096821A1

公开(公告)日：2023-03-30

申请号：US17643861

申请日：2021-12-13

申请人： Ronny Huang , Tara N. Sainath

发明人： Ronny Huang , Tara N. Sainath

IPC分类号： G10L15/06 , G10L15/197 , G10L15/22 , G10L15/16 , G06N3/02

摘要： A method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. Each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. The method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. The method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.