Abstract:
Image depth inference techniques and systems from semantic labels are described. In one or more implementations, a digital medium environment includes one or more computing devices to control a determination of depth within an image. Regions of the image are semantically labeled by the one or more computing devices. At least one of the semantically labeled regions is decomposed into a plurality of segments formed as planes generally perpendicular to a ground plane of the image. Depth of one or more of the plurality of segments is then inferred based on relationships of respective segments with respective locations of the ground plane of the image. A depth map is formed that describes depth for the at least one semantically labeled region based at least in part on the inferred depths for the one or more of the plurality of segments.
Abstract:
In embodiments of removing noise from an image via efficient patch distance computations, weights are computed for patches of pixels in a digital image, and the computed weights are multiplied by respective offset values of offset images that are pixelwise shifted images of the entire digital image. The weights can be applied to the pixels in the digital image on a patch-by-patch basis to restore values of the pixels. Additionally, the digital image can be pixelwise shifted to generate the offset images of the digital image, and the digital image is compared to the offset images. Lookup tables of pixel values can be generated based on the comparisons of the digital image to the offset images, and integral images generated from the lookup tables. Distances to the patches of pixels in the digital image are computed from the integral images, and the computed weights are based on the computed distances.
Abstract:
Deep convolutional neural networks receive local and global representations of images as inputs and learn the best representation for a particular feature through multiple convolutional and fully connected layers. A double-column neural network structure receives each of the local and global representations as two heterogeneous parallel inputs to the two columns. After some layers of transformations, the two columns are merged to form the final classifier. Additionally, features may be learned in one of the fully connected layers. The features of the images may be leveraged to boost classification accuracy of other features by learning a regularized double-column neural network.
Abstract:
Cropping boundary simplicity techniques are described. In one or more implementations, multiple candidate cropping s of a scene are generated. For each of the candidate croppings, a score is calculated that is indicative of a boundary simplicity for the candidate cropping. To calculate the boundary simplicity, complexity of the scene along a boundary of a respective candidate cropping is measured. The complexity is measured, for instance, using an average gradient, an image edge map, or entropy along the boundary. Values indicative of the complexity may be derived from the measuring. The candidate croppings may then be ranked according to those values. Based on the scores calculated to indicate the boundary simplicity, one or more of the candidate croppings may be chosen e.g., to present the chosen croppings to a user for selection.
Abstract:
In techniques for video denoising using optical flow, image frames of video content include noise that corrupts the video content. A reference frame is selected, and matching patches to an image patch in the reference frame are determined from within the reference frame. A noise estimate is computed for previous and subsequent image frames relative to the reference frame. The noise estimate for an image frame is computed based on optical flow, and is usable to determine a contribution of similar motion patches to denoise the image patch in the reference frame. The similar motion patches from the previous and subsequent image frames that correspond to the image patch in the reference frame are determined based on the optical flow computations. The image patch is denoised based on an average of the matching patches from reference frame and the similar motion patches determined from the previous and subsequent image frames.
Abstract:
Different candidate windows in an image are identified, such as by sliding a rectangular or other geometric shape of different sizes over an image to identify portions of the image (groups of pixels in the image). The candidate windows are analyzed by a set of convolutional neural networks, which are cascaded so that the input of one convolutional neural network layer is based on the input of another convolutional neural network layer. Each convolutional neural network layer drops or rejects one or more candidate windows that the convolutional neural network layer determines does not include an object (e.g., a face). The candidate windows that are identified as including an object (e.g., a face) are analyzed by another one of the convolutional neural network layers. The candidate windows identified by the last of the convolutional neural network layers are the indications of the objects (e.g., faces) in the image.
Abstract:
In embodiments of removing noise from an image via efficient patch distance computations, weights are computed for patches of pixels in a digital image, and the computed weights are multiplied by respective offset values of offset images that are pixelwise shifted images of the entire digital image. The weights can be applied to the pixels in the digital image on a patch-by-patch basis to restore values of the pixels. Additionally, the digital image can be pixelwise shifted to generate the offset images of the digital image, and the digital image is compared to the offset images. Lookup tables of pixel values can be generated based on the comparisons of the digital image to the offset images, and integral images generated from the lookup tables. Distances to the patches of pixels in the digital image are computed from the integral images, and the computed weights are based on the computed distances.
Abstract:
Image cropping suggestion using multiple saliency maps is described. In one or more implementations, component scores, indicative of visual characteristics established for visually-pleasing croppings, are computed for candidate image croppings using multiple different saliency maps. The visual characteristics on which a candidate image cropping is scored may be indicative of its composition quality, an extent to which it preserves content appearing in the scene, and a simplicity of its boundary. Based on the component scores, the croppings may be ranked with regard to each of the visual characteristics. The rankings may be used to cluster the candidate croppings into groups of similar croppings, such that croppings in a group are different by less than a threshold amount and croppings in different groups are different by at least the threshold amount. Based on the clustering, croppings may then be chosen, e.g., to present them to a user for selection.
Abstract:
Techniques are disclosed for indexing and searching high-dimensional data using inverted file structures and product quantization encoding. An image descriptor is quantized using a form of product quantization to determine which of several inverted lists the image descriptor is to be stored. The image descriptor is appended to the corresponding inverted list with a compact coding using a product quantization encoding scheme. When processing a query, a shortlist is computed that includes a set of candidate search results. The shortlist is based on the orthogonality between two random vectors in high-dimensional spaces. The inverted lists are traversed in the order of the distance between the query and the centroid of a coarse quantizer corresponding to each inverted list. The shortlist is ranked according to the distance estimated by a form of product quantization, and the top images referred to by the ranked shortlist are reported as the search results.
Abstract:
Feature interpolation techniques are described. In a training stage, features are extracted from a collection of training images and quantized into visual words. Spatial configurations of the visual words in the training images are determined and stored in a spatial configuration database. In an object detection stage, a portion of features of an image are extracted from the image and quantized into visual words. Then, a remaining portion of the features of the image are interpolated using the visual words and the spatial configurations of visual words stored in the spatial configuration database.