Abstract:
This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.
Abstract:
Among other things, this document describes a computer-implemented method. The method can include obtaining a plurality of text samples. For each of one or more text samples in the plurality of text samples, the text sample can be annotated with one or more labels that indicate respective classes to which one or more terms in the text sample are assigned, wherein annotating the text sample comprises determining that at least one term in the text sample corresponds to a first entity in a data structure of interconnected entities and determining a classification of the first entity within the data structure of interconnected entities. The method can include generating a class-based training set of text samples. A class-based language model can be trained using the class-based training set of text samples. A plurality of class-specific language models can be trained.
Abstract:
This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech in an utterance. The methods, systems, and apparatus may include actions of obtaining a candidate transcription including a sequence of words and generating morphological variants of one or more of the words from the candidate transcription. Additional actions may include, for each morphological variant, generating one or more additional candidate transcriptions that each include the morphological variant. Further actions may include generating respective language model scores for the candidate transcription and the one or more additional candidate transcriptions. Additional actions may include selecting a particular transcription from among the candidate transcription and the one or more additional candidate transcriptions, based on the language model scores.