摘要:
A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. A rule is created for each of the labels employed in the classifier, and the created rules are applied to the given corpus to create a corpus of attachments by appending a weight of ηp(x), or 1−ηp(x), to labels of entries that meet, or fail to meet, respectively, conditions of the labels' rules, and to also create a corpus of non-attachments by appending a weight of 1−ηp(x), or ηp(x), to labels of entries that meet, or fail to meet conditions of the labels' rules.
摘要:
Embodiments of the present invention relate to a method and system for augmenting a training database of an automated language-understanding system. In one embodiment, a training example in a first language may be received from the training database. The first language-training example may be translated to a second language output. The second language output may be translated to a first variant of the first language-training example. An action pair including the first variant of the first language-training example and an action command associated with the first language-training example may be stored in an augmented training database.
摘要:
Methods and systems for language translation are disclosed. The translator is based on finite state machines that can convert a pair of input symbol sequences to a pair of output symbol sequences. The translator includes a lexicon associating a finite state machine with a pair of head words with corresponding meanings in the source and target languages. The state machine for a source language head word w and a target language head word &ngr; reads the dependent words of w to its left and right in a source sentence and proposes corresponding dependents to the left and right of &ngr; in a target language sentence being constructed, taking account of the required word order for the target language. The state machines are used by a transduction search engine to generate a plurality of candidate translations via a recursive process wherein, a source language head word is first translated as described above, and then the heads of each of the dependent phrases are similarly translated, and then their dependents and so on. Only the state machines corresponding to the words in the source language string are activated and used by the search engine. The translator also includes a parameter table that provides costs for actions taken by each finite state machine in converting between the source language and the target language. The costs for machine transitions are indicative of the likelihood of co-occurence of pairs of words in the source language, and between corresponding pairs of words in the target language. The transduction search engine provides a total cost, using the parameter table, for each of the candidate translations. The total cost of a translation is the sum of the cost for all actions taken by each machine involved in the translation.
摘要:
Embodiments of the present invention relate to a method and system for augmenting a training database of an automated language-understanding system. In one embodiment, a training example in a first language is received from the training database. The first language-training example is translated to a second language output. The second language output is translated to a first variant of the first language-training example. An action pair including the first variant of the first language-training example and an action command associated with the first language-training example is stored in an augmented training database.
摘要:
Methods, systems, and apparatus, including computer program products, for suggesting alternative queries based on original query search results. In one aspect, a method includes receiving search results for a first query, where each search result refers to a respective resource and includes a snippet of content from the respective resource, receiving one or more suggested second queries, for each of the suggested second queries: selecting a set of words in one of the snippets to represent the suggested second query, associating the suggested second query with the set so that a user can interact with a word in the set to invoke the suggested second query, and marking the set so as to indicate that the user can interact with a word in the set to invoke the suggested second query, and transmitting the search results including each marked set to a client device for presentation to the user.
摘要:
A method and apparatus for automatically constructing hierarchical transduction models for language translation is presented. The input to the construction process may be a database of examples each consisting of a transcribed speech utterance and its translation into another language. A translation pairing score is assigned (or computed) for translating a word in the source language into each of the possible translations it has in the target language. For each instance of the resulting training dataset, a head transducer may be constructed that translates the source string into the target string by splitting the source string into a source head word, the words preceding the source head word, and the words following the source head word. This process may be performed recursively to generate a set of transducer fragments. The transducer fragments may form a statistical head transducer model. The head transducer translation model may then be input into a transduction search module.
摘要:
A method of searching for information is described. A sequence of terms, including one or more term segments and one or more identifiers corresponding to one or more missing terms, is received. The sequence of terms is converted into a corresponding search pattern, including a set of one or more query expressions and one or more ordering constraints. The search pattern is compared with a plurality of documents to identify a set of documents. Match scores for one or more matches between the search pattern and documents in the set of documents are determined. Content in the set of documents corresponding to the one or more missing terms in the search pattern are identified and a ranked set of information items containing the identified content is provided in accordance with the match scores.
摘要:
Embodiments of the present invention relate to a method and system for augmenting a training database of an automated language-understanding system. In one embodiment, a training example in a first language is received from the training database. The first language-training example translated to a second language output. The second language output is translated to a first variant of the first language-training example. An action pair including the first variant of the first language-training example and an action command associated with the first language-training example is stored in an augmented training database.
摘要:
Methods and apparatus for a language model and language recognition systems are disclosed. The method utilizes a plurality of probabilistic finite state machines having the ability to recognize a pair of sequences, one sequence scanned leftwards, the other scanned rightwards. Each word in the lexicon of the language model is associated with one or more such machines which model the semantic relations between the word and other words. Machine transitions create phrases from a set of word string hypotheses, and incrementally calculate costs related to the probability that such phrases represent the language to be recognized. The cascading lexical head machines utilized in the methods and apparatus capture the structural associations implicit in the hierachical organization of a sentence, resulting in a language model and language recognition systems that combine the lexical sensitivity of N-gram models with the structural properties of dependency grammar.
摘要:
A client system provides to a server system a fill-the-blank query comprising one or more term segments and one or more missing term identifiers signifying missing information sought by a user. The client system receives from the server system a response to the query, the response including at least one or more potential answers corresponding to the one or more missing term identifiers in the fill-the-blank query, and then displays the response to the query, including displaying the one or more potential answers. Optionally, the client system displays a ranked list of documents containing the one or more potential answers. Optionally, the response to the query further includes snippets of text from one or more documents containing the one or more potential answers. Optionally, the fill-the-blank query includes a respective missing term identifier located between two respective term segments.