Abstract:
Techniques for sharpening an image using local spatial adaptation and/or patch-based image processing. An image can be sharpened by creating a high-frequency image and then combining that high frequency image with the image. This process can be applied iteratively by using the output of one iteration, i.e., the sharpened image, as the input to the next iteration. Using local spatial adaptation and/or patch-based techniques can provide various advantages. How to change the intensity at a given position in the image can be calculated from more than just information about that same position in the input image and the blurred image. By using information about neighboring positions an improved high frequency image can be determined that, when combined with the input image, reduces ringing and halo artifacts, suppresses noise boosting, and/or generates results with sharper and cleaner edges and details.
Abstract:
A method and systems of identifying one or more patches in three or more frames in a video are provided. A region in a reference frame of the video may be detected. A set of regions in a prior frame and subsequent frame that are similar to the region in the reference frame may then be identified. Temporal consistency between the region in the reference frame and two or more regions in the set of regions in the prior and subsequent frames may then be calculated. Patches of regions in the first, reference, and third frames may be identified based at least in part on the calculated temporal consistencies, with each patch identifying a region in the reference frame that can be mapped to a similar region in the prior and subsequent frames.
Abstract:
In embodiments of spatially coherent nearest neighbor fields, initial matching patches of a nearest neighbor field can be determined at image grid locations of a first digital image and a second digital image. Spatial coherency can be enforced for each matching patch in the second digital image with reference to respective matching patches in the first digital image based on motion data of neighboring matching patches. A multi-resolution iterative process can then update each spatially coherent matching patch based on overlapping grid regions of the matching patches that are evaluated for matching regions of the first and second digital images. An optimal, spatially coherent matching patch can be selected for each of the image grid locations of the first and second digital images based on iterative interaction to enforce the spatial coherency of each matching patch and the multi-resolution iterative process to update each spatially coherent matching patch.
Abstract:
In embodiments of statistics of nearest neighbor fields, matching patches of a nearest neighbor field can be determined at image grid locations of a first digital image and a second digital image. A motion field can then be determined based on motion data of the matching patches. Predominant motion components of the motion field can be determined based on statistics of the motion data to generate a final motion field. The predominant motion components correspond to a motion of objects as represented by a displacement between the first and second digital images. One of the predominant motion components can then be assigned to each of the matching patches to optimize the final motion field of the matching patches.
Abstract:
Systems and methods are provided for providing learned, piece-wise patch regression for image enhancement. In one embodiment, an image manipulation application generates training patch pairs that include training input patches and training output patches. Each training patch pair includes a respective training input patch from a training input image and a respective training output patch from a training output image. The training input image and the training output image include at least some of the same image content. The image manipulation application determines patch-pair functions from at least some of the training patch pairs. Each patch-pair function corresponds to a modification to a respective training input patch to generate a respective training output patch. The image manipulation application receives an input image generates an output image from the input image by applying at least some of the patch-pair functions based on at least some input patches of the input image.
Abstract:
Various embodiments of methods and apparatus for feature point localization are disclosed. An object in an input image may be detected. A profile model may be applied to determine feature point locations for each object component of the detected object. Applying the profile model may include globally optimizing the feature points for each object component to find a global energy minimum. A component-based shape model may be applied to update the respective feature point locations for each object component.
Abstract:
Various embodiments describe view switching of video on a computing device. In an example, a video processing application executed on the computing device receives a stream of video data. The video processing application renders a major view on a display of the computing device. The major view presents a video from the stream of video data. The video processing application inputs the stream of video data to a deep learning system and receives back information that identifies a cropped video from the video based on a composition score of the cropped video, while the video is presented in the major view. The composition score is generated by the deep learning system. The video processing application renders a sub-view on a display of the device, the sub-view presenting the cropped video. The video processing application renders the cropped video in the major view based on a user interaction with the sub-view.
Abstract:
A framework is provided for associating images with topics utilizing embedding learning. The framework is trained utilizing images, each having multiple visual characteristics and multiple keyword tags associated therewith. Visual features are computed from the visual characteristics utilizing a convolutional neural network and an image feature vector is generated therefrom. The keyword tags are utilized to generate a weighted word vector (or “soft topic feature vector”) for each image by calculating a weighted average of word vector representations that represent the keyword tags associated with the image. The image feature vector and the soft topic feature vector are aligned in a common embedding space and a relevancy score is computed for each of the keyword tags. Once trained, the framework can automatically tag images and a text-based search engine can rank image relevance with respect to queried keywords based upon predicted relevancy scores.
Abstract:
The invention is directed towards segmenting images based on natural language phrases. An image and an n-gram, including a sequence of tokens, are received. An encoding of image features and a sequence of token vectors are generated. A fully convolutional neural network identifies and encodes the image features. A word embedding model generates the token vectors. A recurrent neural network (RNN) iteratively updates a segmentation map based on combinations of the image feature encoding and the token vectors. The segmentation map identifies which pixels are included in an image region referenced by the n-gram. A segmented image is generated based on the segmentation map. The RNN may be a convolutional multimodal RNN. A separate RNN, such as a long short-term memory network, may iteratively update an encoding of semantic features based on the order of tokens. The first RNN may update the segmentation map based on the semantic feature encoding.
Abstract:
Techniques and systems are described to determine personalized digital image aesthetics in a digital medium environment. In one example, a personalized offset is generated to adapt a generic model for digital image aesthetics. A generic model, once trained, is used to generate training aesthetics scores from a personal training data set that corresponds to an entity, e.g., a particular user, group of users, and so on. The image aesthetics system then generates residual scores (e.g., offsets) as a difference between the training aesthetics score and the personal aesthetics score for the personal training digital images. The image aesthetics system then employs machine learning to train a personalized model to predict the residual scores as a personalized offset using the residual scores and personal training digital images.