Abstract:
Techniques for increasing robustness of a convolutional neural network based on training that uses multiple datasets and multiple tasks are described. For example, a computer system trains the convolutional neural network across multiple datasets and multiple tasks. The convolutional neural network is configured for learning features from images and accordingly generating feature vectors. By using multiple datasets and multiple tasks, the robustness of the convolutional neural network is increased. A feature vector of an image is used to apply an image-related operation to the image. For example, the image is classified, indexed, or objects in the image are tagged based on the feature vector. Because the robustness is increased, the accuracy of the generating feature vectors is also increased. Hence, the overall quality of an image service is enhanced, where the image service relies on the image-related operation.
Abstract:
Semantic class localization techniques and systems are described. In one or more implementation, a technique is employed to back communicate relevancies of aggregations back through layers of a neural network. Through use of these relevancies, activation relevancy maps are created that describe relevancy of portions of the image to the classification of the image as corresponding to a semantic class. In this way, the semantic class is localized to portions of the image. This may be performed through communication of positive and not negative relevancies, use of contrastive attention maps to different between semantic classes and even within a same semantic class through use of a self-contrastive technique.
Abstract:
Techniques for increasing robustness of a convolutional neural network based on training that uses multiple datasets and multiple tasks are described. For example, a computer system trains the convolutional neural network across multiple datasets and multiple tasks. The convolutional neural network is configured for learning features from images and accordingly generating feature vectors. By using multiple datasets and multiple tasks, the robustness of the convolutional neural network is increased. A feature vector of an image is used to apply an image-related operation to the image. For example, the image is classified, indexed, or objects in the image are tagged based on the feature vector. Because the robustness is increased, the accuracy of the generating feature vectors is also increased. Hence, the overall quality of an image service is enhanced, where the image service relies on the image-related operation.
Abstract:
Image zooming is described. In one or more implementations, zoomed croppings of an image are scored. The scores calculated for the zoomed croppings are indicative of a zoomed cropping's inclusion of content that is captured in the image. For example, the scores are indicative of a degree to which a zoomed cropping includes salient content of the image, a degree to which the salient content included in the zoomed cropping is centered in the image, and a degree to which the zoomed cropping preserves specified regions-to-keep and excludes specified regions-to-remove. Based on the scores, at least one zoomed cropping may be chosen to effectuate a zooming of the image. Accordingly, the image may be zoomed according to the zoomed cropping such that an amount the image is zoomed corresponds to a scale of the zoomed cropping.
Abstract:
Missing region prediction techniques are described. In implementations, an image pair is obtained that includes first and second images. The first image is corrupted by removing a region of content, resulting in a corrupted image having a missing region. The corrupted image and the second image of the image pair are then used to generate a training-image pair. Then, based on a plurality of training-image pairs including the generated training-image pair, a model is trained using machine learning. The model can subsequently be used to predict pixel values of pixels within a subsequent missing region of a subsequent image that is not used as part of the training.
Abstract:
In embodiments of image color and tone style transfer, a computing device implements an image style transfer algorithm to generate a modified image from an input image based on a color style and a tone style of a style image. A user can select the input image that includes color features, as well as select the style image that includes an example of the color style and the tone style to transfer to the input image. A chrominance transfer function can then be applied to transfer the color style to the input image, utilizing a covariance of an input image color of the input image to control modification of the input image color. A luminance transfer function can also be applied to transfer the tone style to the input image, utilizing a tone mapping curve based on a non-linear optimization to estimate luminance parameters of the tone mapping curve.
Abstract:
A convolutional neural network is trained to analyze input data in various different manners. The convolutional neural network includes multiple layers, one of which is a convolution layer that performs a convolution, for each of one or more filters in the convolution layer, of the filter over the input data. The convolution includes generation of an inner product based on the filter and the input data. Both the filter of the convolution layer and the input data are binarized, allowing the inner product to be computed using particular operations that are typically faster than multiplication of floating point values. The possible results for the convolution layer can optionally be pre-computed and stored in a look-up table. Thus, during operation of the convolutional neural network, rather than performing the convolution on the input data, the pre-computed result can be obtained from the look-up table.
Abstract:
This disclosure relates to training a classifier algorithm that can be used for automatically selecting tags to be applied to a received image. For example, a computing device can group training images together based on the training images having similar tags. The computing device trains a classifier algorithm to identify the training images as semantically similar to one another based on the training images being grouped together. The trained classifier algorithm is used to determine that an input image is semantically similar to an example tagged image. A tag is generated for the input image using tag content from the example tagged image based on determining that the input image is semantically similar to the tagged image.
Abstract:
Feature interpolation techniques are described. In a training stage, features are extracted from a collection of training images and quantized into visual words. Spatial configurations of the visual words in the training images are determined and stored in a spatial configuration database. In an object detection stage, a portion of features of an image are extracted from the image and quantized into visual words. Then, a remaining portion of the features of the image are interpolated using the visual words and the spatial configurations of visual words stored in the spatial configuration database.
Abstract:
A system and method for distributed similarity learning for high-dimensional image features are described. A set of data features is accessed. Subspaces from a space formed by the set of data features are determined using a set of projection matrices. Each subspace has a dimension lower than a dimension of the set of data features. Similarity functions are computed for the subspaces. Each similarity function is based on the dimension of the corresponding subspace. A linear combination of the similarity functions is performed to determine a similarity function for the set of data features.