Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classification using a neural network. One of the methods for processing an input through each of multiple layers of a neural network to generate an output, wherein each of the multiple layers of the neural network includes a respective multiple nodes includes for a particular layer of the multiple layers: receiving, by a classification system, an activation vector as input for the particular layer, selecting one or more nodes in the particular layer using the activation vector and a hash table that maps numeric values to nodes in the particular layer, and processing the activation vector using the selected nodes to generate an output for the particular layer.
Abstract:
Techniques are disclosed for producing a collaborative recording of an audio event. An online server or service identifies participating mobile devices with recording capabilities that are available for recording at least a portion of the audio event. The online server or service determines the locations of the potential participating mobile devices, and identifies ranges of frequencies to be recorded by each of the participating mobile devices. The individual recordings are then compiled into a final collaborative recording.
Abstract:
A volume identification system identifies a set of unlabeled spatio-temporal volumes within each of a set of videos, each volume representing a distinct object or action. The volume identification system further determines, for each of the videos, a set of volume-level features characterizing the volume as a whole. In one embodiment, the features are based on a codebook and describe the temporal and spatial relationships of different codebook entries of the volume. The volume identification system uses the volume-level features, in conjunction with existing labels assigned to the videos as a whole, to label with high confidence some subset of the identified volumes, e.g., by employing consistency learning or training and application of weak volume classifiers. The labeled volumes may be used for a number of applications, such as training strong volume classifiers, improving video search (including locating individual volumes), and creating composite videos based on identified volumes.
Abstract:
This disclosure relates to transformation invariant media matching. A fingerprinting component can generate a transformation invariant identifier for media content by adaptively encoding the relative ordering of signal markers in media content. The signal markers can be adaptively encoded via reference point geometry, or ratio histograms. An identification component compares the identifier against a set of identifiers for known media content, and the media content can be matched or identified as a function of the comparison.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for clustering images. In one aspect a system includes one or more computers configured to, for each of a plurality of digital images, associate extrinsic image-related information with each individual image, the extrinsic image-related information including text information and co-click data for the individual image, assign images from the plurality of images to one or more of the clusters of images based on the extrinsic information associated with each of the plurality of images, receive in the search system a user query from a user device, identify by operation of the search system one or more clusters of images that match the query, and provide one or more cluster results, where each cluster result provides information about an identified cluster.
Abstract:
This disclosure relates to transformation invariant media matching. A fingerprinting component can generate a transformation invariant identifier for media content by adaptively encoding the relative ordering of signal markers in media content. The signal markers can be adaptively encoded via reference point geometry, or ratio histograms. An identification component compares the identifier against a set of identifiers for known media content, and the media content can be matched or identified as a function of the comparison.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating outputs from received inputs using deep vector table machine (VTM) systems. One of the methods includes receiving an input; processing the input through each of a plurality of VTM layers to generate an alternative representation of the input, wherein the plurality of VTM layers are arranged in a sequence from a lowest VTM layer to a highest VTM layer, and wherein each VTM layer is configured to: receive an input representation of the input, generate a sparse representation of the input representation in accordance with a set of sparse parameter vectors for the VTM layer, and generate an output representation from the sparse representation in accordance with a set of output parameter vectors for the VTM layer; and processing the alternative representation of the input through an output layer to generate an output for the input.
Abstract:
A plurality of videos is analyzed (in real time or after the videos are generated) to identify interesting portions of the videos. The interesting portions are identified based on one or more of the people depicted in the videos, the objects depicted in the videos, the motion of objects and/or people in the videos, and the locations where people depicted in the videos are looking. The interesting portions are combined to generate a content item.
Abstract:
A segmentation annotation technique for media items is disclosed herein. Given a weakly labeled media item, spatiotemporal masks may be generated for each of the concepts with which it is labeled. Segments may be ranked by the likelihood that they correspond to a given concept. The ranked concept segments may be utilized to train a classifier that, in turn, may be used to classify untagged or new media items.
Abstract:
A method includes identifying a named entity, retrieving images associated with the named entity, and using a face detection algorithm to perform face detection on the retrieved images to detect faces in the retrieved images. At least one representative face image from the retrieved images is identified, and the representative face image is used to identify one or more additional images representing the at least one named entity.