Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for translating terms using numeric representations. One of the methods includes obtaining data that associates each term in a vocabulary of terms in a first language with a respective high-dimensional representation of the term; obtaining data that associates each term in a vocabulary of terms in a second language with a respective high-dimensional representation of the term; receiving a first language term; and determining a translation into the second language of the first language term from the high-dimensional representation of the first language term and the high-dimensional representations of terms in the vocabulary of terms in the second language.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating labeled images. One of the methods includes selecting a plurality of candidate videos from videos identified in a response to a search query derived from a label for an object category; selecting one or more initial frames from each of the candidate videos; detecting one or more initial images of objects in the object category in the initial frames; for each initial frame including an initial image of an object in the object category, tracking the object through surrounding frames to identify additional images of the object; and selecting one or more images from the one or more initial images and one or more additional images as database images of objects belonging to the object category.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes obtaining an input sequence, the input sequence comprising a plurality of inputs arranged according to an input order; processing the input sequence using a first long short term memory (LSTM) neural network to convert the input sequence into an alternative representation for the input sequence; and processing the alternative representation for the input sequence using a second LSTM neural network to generate a target sequence for the input sequence, the target sequence comprising a plurality of outputs arranged according to an output order.
Abstract:
Methods, systems, and apparatus including computer programs encoded on a computer storage medium, for augmenting search engine index that indexes resources from a collection of resources. In one aspect, a method of augmenting a first search engine index that indexes resources from a first collection of resources includes the actions of identifying a first resource, in the first collection of resources, that is indexed in the first search engine index for which a value of a search engine ranking signal is not available, wherein a search engine uses values of the search engine ranking signal in ranking resources in response to received search queries; processing text from the first resource using a machine learning model, the machine learning model being configured to: process the text to predict a value of the search engine ranking signal for the first resource; and updating the first search engine index by associating the predicted value of the search engine ranking signal with the first resource in the first search engine index.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating document vector representations. One of the methods includes obtaining a new document; and determining a vector representation for the new document using a trained neural network system, wherein the trained neural network system has been trained to receive an input document and a sequence of words from the input document and to generate a respective word score for each word in a set of words, wherein each of the respective word scores represents a predicted likelihood that the corresponding word follows a last word in the sequence in the input document, and wherein determining the vector representation for the new document using the trained neural network system comprises iteratively providing each of the plurality of sequences of words to the trained neural network system to determine the vector representation for the new document using gradient descent.
Abstract:
Systems and techniques are disclosed for labeling objects within an image. The objects may be labeled by selecting an option from a plurality of options such that each option is a potential label for the object. An option may have an option score associated with. Additionally, a relation score may be calculated for a first option and a second option corresponding to a second object in an image. The relation score may be based on a frequency, probability, or observance corresponding to the co-occurrence of text associated with the first option and the second option in a text corpus such as the World Wide Web. An option may be selected as a label for an object based on a global score calculated based at least on an option score and relation score associated with the option.
Abstract:
A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing machine learning tasks. One method includes receiving (i) a model input, and (ii) data identifying a first machine learning task to be performed on the model input to generate a first type of model output for the model input; augmenting the model input with an identifier for the first machine learning task to generate an augmented model input; and processing the augmented model input using a machine learning model, wherein the machine learning model has been trained on training data to perform a plurality of machine learning tasks including the first machine learning task, and wherein the machine learning model has been configured through training to process the augmented model input to generate a machine learning model output of the first type for the model input.