Abstract:
Example systems and methods for classifying visual patterns into a plurality of classes are presented. Using reference visual patterns of known classification, at least one image or visual pattern classifier is generated, which is then employed to classify a plurality of candidate visual patterns of unknown classification. The classification scheme employed may be hierarchical or nonhierarchical. The types of visual patterns may be fonts, human faces, or any other type of visual patterns or images subject to classification.
Abstract:
In techniques for fast dense patch search and quantization, partition center patches are determined for partitions of example image patches. Patch groups of an image each include similar image patches and a reference image patch that represents a respective patch group. A partition center patch of the partitions is determined as a nearest neighbor to the reference image patch of a patch group. The partition center patch can be determined based on a single-nearest neighbor (1-NN) distance determination, and the determined partition center patch is allocated as the nearest neighbor to the similar image patches in the patch group. Alternatively, a group of nearby partition center patches are determined as the nearest neighbors to the reference image patch based on a k-nearest neighbor (k-NN) distance determination, and the nearest neighbor to each of the similar image patches in the patch group is determined from the nearby partition center patches.
Abstract:
A hierarchy machine may be configured as a clustering machine that utilizes local feature embedding to organize visual patterns into nodes that each represent one or more visual patterns. These nodes may be arranged as a hierarchy in which a node may have a parent-child relationship with one or more other nodes. The hierarchy machine may implement a node splitting and tree-learning algorithm that includes hard-splitting of nodes and soft-assignment of nodes to perform error-bounded splitting of nodes into clusters. This may enable the hierarchy machine, which may form all or part of a visual pattern recognition system, to perform large-scale visual pattern recognition, such as font recognition or facial recognition, based on a learned error-bounded tree of visual patterns.
Abstract:
Image upscaling techniques are described. These techniques may include use of iterative and adjustment upscaling techniques to upscale an input image. A variety of functionality may be incorporated as part of these techniques, examples of which include content-adaptive patch finding techniques that may be employed to give preference to an in-place patch to minimize structure distortion. In another example, content metric techniques may be employed to assign weights for combining patches. In a further example, algorithm parameters may be adapted with respect to algorithm iterations, which may be performed to increase efficiency of computing device resource utilization and speed of performance. For instance, algorithm parameters may be adapted to enforce a minimum and/or maximum number to iterations, cease iterations for image sizes over a threshold amount, set sampling step sizes for patches, employ techniques based on color channels (which may include independence and joint processing techniques), and so on.
Abstract:
Techniques are disclosed for image feature representation. The techniques exhibit discriminative power that can be used in any number of classification tasks, and are particularly effective with respect to fine-grained image classification tasks. In an embodiment, a given image to be classified is divided into image patches. A vector is generated for each image patch. Each image patch vector is compared to the Gaussian mixture components (each mixture component is also a vector) of a Gaussian Mixture Model (GMM). Each such comparison generates a similarity score for each image patch vector. For each Gaussian mixture component, the image patch vectors associated with a similarity score that is too low are eliminated. The selectively pooled vectors from all the Gaussian mixture components are then concatenated to form the final image feature vector, which can be provided to a classifier so the given input image can be properly categorized.
Abstract:
Certain embodiments involve learning features of content items (e.g., images) based on web data and user behavior data. For example, a system determines latent factors from the content items based on data including a user's text query or keyword query for a content item and the user's interaction with the content items based on the query (e.g., a user's click on a content item resulting from a search using the text query). The system uses the latent factors to learn features of the content items. The system uses a previously learned feature of the content items for iterating the process of learning features of the content items to learn additional features of the content items, which improves the accuracy with which the system is used to learn other features of the content items.
Abstract:
Techniques for facial expression capture for character animation are described. In one or more implementations, facial key points are identified in a series of images. Each image, in the series of images, is normalized from the identified facial key points. Facial features are determined from each of the normalized images. Then a facial expression is classified, based on the determined facial features, for each of the normalized images. In additional implementations, a series of images are captured that include performances of one or more facial expressions. The facial expressions in each image of the series of images are classified by a facial expression classifier. Then the facial expression classifications are used by a character animator system to produce a series of animated images of an animated character that include animated facial expressions that are associated with the facial expression classification of the corresponding image in the series of images.
Abstract:
A convolutional neural network (CNN) is trained for font recognition and font similarity learning. In a training phase, text images with font labels are synthesized by introducing variances to minimize the gap between the training images and real-world text images. Training images are generated and input into the CNN. The output is fed into an N-way softmax function dependent on the number of fonts the CNN is being trained on, producing a distribution of classified text images over N class labels. In a testing phase, each test image is normalized in height and squeezed in aspect ratio resulting in a plurality of test patches. The CNN averages the probabilities of each test patch belonging to a set of fonts to obtain a classification. Feature representations may be extracted and utilized to define font similarity between fonts, which may be utilized in font suggestion, font browsing, or font recognition applications.
Abstract:
In techniques for video denoising using optical flow, image frames of video content include noise that corrupts the video content. A reference frame is selected, and matching patches to an image patch in the reference frame are determined from within the reference frame. A noise estimate is computed for previous and subsequent image frames relative to the reference frame. The noise estimate for an image frame is computed based on optical flow, and is usable to determine a contribution of similar motion patches to denoise the image patch in the reference frame. The similar motion patches from the previous and subsequent image frames that correspond to the image patch in the reference frame are determined based on the optical flow computations. The image patch is denoised based on an average of the matching patches from reference frame and the similar motion patches determined from the previous and subsequent image frames.
Abstract:
Multi-feature image haze removal is described. In one or more implementations, feature maps are extracted from a hazy image of a scene. The feature maps convey information about visual characteristics of the scene captured in the hazy image. Based on the feature maps, portions of light that are not scattered by the atmosphere and are captured to produce the hazy image are computed. Additionally, airlight of the hazy image is ascertained based on at least one of the feature maps. The calculated airlight represents constant light of the scene. Using the computed portions of light and the ascertained airlight, a dehazed image is generated from the hazy image.