Abstract:
The technology disclosed provides a quasi-recurrent neural network (QRNN) that alternates convolutional layers, which apply in parallel across timesteps, and minimalist recurrent pooling layers that apply in parallel across feature dimensions.
Abstract:
The technology disclosed provides a quasi-recurrent neural network (QRNN) that alternates convolutional layers, which apply in parallel across timesteps, and minimalist recurrent pooling layers that apply in parallel across feature dimensions.
Abstract:
Automatically creating word breakers which segment words into morphemes is described, for example, to improve information retrieval, machine translation or speech systems. In embodiments a cross-lingual phrase table, comprising source language (such as Turkish) phrases and potential translations in a target language (such as English) with associated probabilities, is available. In various examples, blocks of source language phrases from the phrase table are created which have similar target language translations. In various examples, inference using the target language translations in a block enables stem and affix combinations to be found for source language words without the need for input from human-judges or prior knowledge of source language linguistic rules or a source language lexicon.
Abstract:
A Statistical Machine Translation (SMT) model is trained using pairs of sentences that include content obtained from one or more content sources (e.g. feed(s)) with corresponding queries that have been used to access the content. A query click graph may be used to assist in determining candidate pairs for the SMT training data. All/portion of the candidate pairs may be used to train the SMT model. After training the SMT model using the SMT training data, the SMT model is applied to content to determine predicted queries that may be used to search for the content. The predicted queries are used to train a language model, such as a query language model. The query language model may be interpolated other language models, such as a background language model, as well as a feed language model trained using the content used in determining the predicted queries.
Abstract:
A machine translation system (1) comprises a language analysis module (3) which receives an unknown text (4) and analyses portions of the unknown text (4). The language analysis module (3) identifies language features in the unknown text (4) and provides the linguistic fingerprint to a translation configuration selection module (5). The translation configuration selection module (5) selects translation configurations (T-T9) from a memory (6) which corresponds with the identified linguistic fingerprints and communicates the selected language configurations (T-T9)to a machine translation module (7). The machine translation module (7) translates the unknown text (4) into a different language using the selected translation configurations (T-T9).
Abstract:
The present invention relates to an apparatus and method for recognizing an idiomatic expression using phrase alignment of a parallel corpus, and more particularly, to an apparatus and method extracting an idiom candidate expression using phrase alignment information of a parallel corpus and measuring an idiomatic expression index for each candidate idiomatic expression in order to recognize an idiomatic expression, thereby correcting errors in the measurement of translation entropy and in the extraction of a representative target word, as well as enhancing the accuracy of recognizing an idiomatic expression.
Abstract:
A mining system applies queries to retrieve result items from an unstructured resource. The unstructured resource may correspond to a repository of network-accessible resource items. The result items that are retrieved may correspond to text segments (e.g., sentence fragments) associated with resource items. The mining system produces a structured training set by filtering the result items and establishing respective pairs of result items. A training system can use the training set to produce a statistical translation model. The translation model can be used in a monolingual context to translate between semantically-related phrases in a single language. The translation model can also be used in a bilingual context to translate between phrases expressed in two respective languages. Various applications of the translation model are also described.
Abstract:
Methods, systems, and apparatus, including computer program products, for language translation are disclosed. In one implementation, a method is provided. The method includes accessing a hypothesis space; performing decoding on the hypothesis space to obtain a translation hypothesis that minimizes an expected error in classification calculated relative to an evidence space; and providing the obtained translation hypothesis for use by a user as a suggested translation in a target translation.
Abstract:
Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
Abstract:
Training using tree transducers is described. Given sample input/output pairs as training (100, 110), and given a set of tree transducer rules (120), the information is combined to yield locally optimal weights for those rules (140). This combination is carried out by building a weighted derivation forest for each input/output pair and applying counting methods to those forests (130).