Abstract:
In embodiments of optical flow accounting for image haze, digital images may include objects that are at least partially obscured by a haze that is visible in the digital images, and an estimate of light that is contributed by the haze in the digital images can be determined The haze can be cleared from the digital images based on the estimate of the light that is contributed by the haze, and clearer digital images can be generated. An optical flow between the clearer digital images can then be computed, and the clearer digital images refined based on the optical flow to further clear the haze from the images in an iterative process to improve visibility of the objects in the digital images.
Abstract:
Methods and apparatus for constraining solution space in image processing techniques may use the metadata for a set of images to constrain an image processing solution to a smaller solution space. In one embodiment, a process may require N parameters for processing an image. A determination may be made from metadata that multiple images were captured with the same camera/lens and with the same settings. A set of values may be estimated for the N parameters from data in one or more of the images. The process may then be applied to each of images using the set of values. In one embodiment, a value for a parameter of a process may be estimated for an image. If the estimated value deviates substantially from a value for the parameter in the metadata, the metadata value is used in the process instead of the estimated value.
Abstract:
A feature tracking technique for detecting and tracking feature points with primary colors. An energy value may be computed for each color channel of a feature. If the energy of all the channels is above a threshold, then the feature may be tracked according to a feature tracking method using all channels. Otherwise, if the energy of all of the channels is below the threshold, then the feature is not tracked. If the energy of at least one (but not all) of the channels is below the threshold, then the feature is considered to have primary color, and the feature may be tracked according to the feature tracking method using only the one or more channels with energy above the threshold. The feature tracking techniques may, for example, be used to establish point trajectories in an image sequence for various Structure from Motion (SFM) techniques.
Abstract:
The present disclosure includes methods and systems for generating captions for digital images. In particular, the disclosed systems and methods can train an image encoder neural network and a sentence decoder neural network to generate a caption from an input digital image. For instance, in one or more embodiments, the disclosed systems and methods train an image encoder neural network (e.g., a character-level convolutional neural network) utilizing a semantic similarity constraint, training images, and training captions. Moreover, the disclosed systems and methods can train a sentence decoder neural network (e.g., a character-level recurrent neural network) utilizing training sentences and an adversarial classifier.
Abstract:
Font recognition and similarity determination techniques and systems are described. In a first example, localization techniques are described to train a model using machine learning (e.g., a convolutional neural network) using training images. The model is then used to localize text in a subsequently received image, and may do so automatically and without user intervention, e.g., without specifying any of the edges of a bounding box. In a second example, a deep neural network is directly learned as an embedding function of a model that is usable to determine font similarity. In a third example, techniques are described that leverage attributes described in metadata associated with fonts as part of font recognition and similarity determinations.
Abstract:
Font replacement based on visual similarity is described. In one or more embodiments, a font descriptor includes multiple font features derived from a visual appearance of a font by a font visual similarity model. The font visual similarity model can be trained using a machine learning system that recognizes similarity between visual appearances of two different fonts. A source computing device embeds a font descriptor in a document, which is transmitted to a destination computing device. The destination compares the embedded font descriptor to font descriptors corresponding to local fonts. Based on distances between the embedded and the local font descriptors, at least one matching font descriptor is determined. The local font corresponding to the matching font descriptor is deemed similar to the original font. The destination computing device controls presentations of the document using the similar local font. Computation of font descriptors can be outsourced to a remote location.
Abstract:
The present disclosure is directed towards systems and methods for generating a new aligned image from a plurality of burst image. The systems and methods subdivide a reference image into a plurality of local regions and a subsequent image into a plurality of corresponding local regions. Additionally, the systems and methods detect a plurality of feature points in each of the reference image and the subsequent image and determine matching feature point pairs between the reference image and the subsequent image. Based on the matching feature point pairs, the systems and methods determine at least one homography of the reference image to the subsequent image. Based on the homography, the systems and methods generate a new aligned image that is that is pixel-wise aligned to the reference image. Furthermore, the systems and methods refines boundaries between local regions of the new aligned image.
Abstract:
Embodiments of the present invention relate to learning image representation by distilling from multi-task networks. In implementation, more than one single-task network is trained with heterogeneous labels. In some embodiments, each of the single-task networks is transformed into a Siamese structure with three branches of sub-networks so that a common triplet ranking loss can be applied to each branch. A distilling network is trained that approximates the single-task networks on a common ranking task. In some embodiments, the distilling network is a Siamese network whose ranking function is optimized to approximate an ensemble ranking of each of the single-task networks. The distilling network can be utilized to predict tags to associate with a test image or identify similar images to the test image.
Abstract:
Font recognition and similarity determination techniques and systems are described. In a first example, localization techniques are described to train a model using machine learning (e.g., a convolutional neural network) using training images. The model is then used to localize text in a subsequently received image, and may do so automatically and without user intervention, e.g., without specifying any of the edges of a bounding box. In a second example, a deep neural network is directly learned as an embedding function of a model that is usable to determine font similarity. In a third example, techniques are described that leverage attributes described in metadata associated with fonts as part of font recognition and similarity determinations.
Abstract:
A first set of attributes (e.g., style) is generated through pre-trained single column neural networks and leveraged to regularize the training process of a regularized double-column convolutional neural network (RDCNN). Parameters of the first column (e.g., style) of the RDCNN are fixed during RDCNN training. Parameters of the second column (e.g., aesthetics) are fine-tuned while training the RDCNN and the learning process is supervised by the label identified by the second column (e.g., aesthetics). Thus, features of the images may be leveraged to boost classification accuracy of other features by learning a RDCNN.