Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.
Abstract:
The present disclosure provides methods operable by computing device having one or more applications configured to perform functions based on a received verbal input. The method may comprise receiving a verbal input, obtaining one or more textual phrases corresponding to the received verbal input, and providing the one or more textual phrases to an appropriate application on the computing device. The method may further comprise accumulating data on the one or more textual phrases. The data comprises at least a count of a number of times a particular textual phrase is obtained based on a given received verbal input. Based on the count exceeding a threshold, the method may further comprise providing a query corresponding to the textual phrase, where the query is usable to search an advertisement database for one or more advertisements relating to the textual phrase.
Abstract:
This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating language models. In some implementations, data is accessed that indicates a set of classes corresponding to a concept. A first language model is generated in which a first class represents the concept. A second language model is generated in which second classes represent the concept. Output of the first language model and the second language model is obtained, and the outputs are evaluated. A class from the set of classes is selected based on evaluating the output of the first language model and the output of the second language model. In some implementations, the first class and the second class are selected from a parse tree or other data that indicates relationships among the classes in the set of classes.
Abstract:
A computer-implemented method can include identifying a first set of text samples that include a particular potentially offensive term. Labels can be obtained for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner. A classifier can be trained based at least on the first set of text samples and the labels, the classifier being configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample. The method can further include providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample.
Abstract:
A method includes accessing data specifying a set of actions, each action defining a user device operation and for each action: accessing a corresponding set of command sentences for the action, determining first n-grams in the set of command sentences that are semantically relevant for the action, determining second n-grams in the set of command sentences that are semantically irrelevant for the action, generating a training set of command sentences from the corresponding set of command sentences, the generating the training set of command sentences including removing each second n-gram from each sentence in the corresponding set of command sentences for the action, and generating a command model from the training set of command sentences configured to generate an action score for the action for an input sentence based on: first n-grams for the action, and second n-grams for the action that are also second n-grams for all other actions.
Abstract:
A method iteratively processes data for a set of actions, including: for each action: accessing a corresponding set of command sentences for the action, determining first n-grams that are semantically relevant for the action and second n-grams that are semantically irrelevant for the action, and identifying, from a log of command sentences that includes command sentences not included in the corresponding set of command sentences, candidate command sentences that include one first n-gram and a third n-gram that has not yet been determined to be a first n-gram or a second n-gram; for each candidate command sentence, determining each third n-gram that is semantically relevant for an action to be a first n-gram, and determining each third n-gram that is semantically irrelevant for an action to be a second n-gram, and adjusting the corresponding set of command sentences for each action based on the first n-grams and the second n-grams.