摘要:
Methods, systems, and apparatus, including computer program products, for language translation are disclosed. In one implementation, a method is provided. The method includes determining, for a plurality of feature functions in a translation lattice, a corresponding plurality of error surfaces for each of one or more candidate translations represented in the translation lattice; adjusting weights for the feature functions by traversing a combination of the plurality of error surfaces for phrases in a training set; selecting weighting values that minimize error counts for the traversed combination; and applying the selected weighting values to convert a sample of text from a first language to a second language.
摘要:
Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.
摘要:
Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
摘要:
Systems, methods, and apparatuses including computer program products for machine learning. A method is provided that includes determining model parameters for a plurality of feature functions for a linear machine learning model, ranking the plurality of feature functions according to a quality criterion, and selecting, using the ranking, a group of feature functions from the plurality of feature functions to update with the determined model parameters.
摘要:
Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.
摘要:
Systems, methods, and apparatuses including computer program products for machine learning. A method is provided that includes determining model parameters for a plurality of feature functions for a linear machine learning model, ranking the plurality of feature functions according to a quality criterion, and selecting, using the ranking, a group of feature functions from the plurality of feature functions to update with the determined model parameters.
摘要:
Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting alternative translations. In one aspect, a method includes receiving source language text; receiving translated text corresponding to the source language text from a machine translation system; receiving segmentation data for the translated text, wherein the segmentation data includes a first segmentation of the translated text, the first segmentation dividing the translated text into two or more segments; receiving one or more alternative translations for each of the two or more segments; presenting the source text and the translated text to a user in a user interface; and in response to a user selection of a first portion of the translated text, displaying, in the user interface, one or more alternative translations for a first segment to which the first portion of translated text corresponds according to the first segmentation.
摘要:
Methods, systems, and apparatus, including computer program products, for language translation are disclosed. In one aspect, a method includes accessing a translation hypergraph that represents a plurality of candidate translations, the translation hypergraph including a plurality of paths including nodes connected by edges; calculating first posterior probabilities for each edge in the translation hypergraph; calculating second posterior probabilities for each n-gram represented in the translation hypergraph based on the first posterior probabilities; and performing decoding on the translation hypergraph using the second posterior probabilities to convert a sample text from a first language to a second language.
摘要:
Methods, systems, and apparatus, including computer program products, for language translation are disclosed. In one aspect, a method includes accessing a translation hypergraph that represents a plurality of candidate translations, the translation hypergraph including a plurality of paths including nodes connected by edges; calculating first posterior probabilities for each edge in the translation hypergraph; calculating second posterior probabilities for each n-gram represented in the translation hypergraph based on the first posterior probabilities; and performing decoding on the translation hypergraph using the second posterior probabilities to convert a sample text from a first language to a second language.