摘要:
A method, device and computer-readable medium for generating a super-resolution version of a compressed video stream. By leveraging the motion information and residual information in compressed video streams, described examples are able to skip the time-consuming motion-estimation step for most frames and make the most use of the SR results of key frames. A key frame SR module generates SR versions of I-frames and other key frames of a compressed video stream using techniques similar to existing multi-frame approaches to VSR. A non-key frame SR module generates SR version of the non-key inter frames between these key frames by making use of motion information and residual information used to encode the inter frames in the compressed video stream.
摘要:
Systems and methods for multi-frame video frame interpolation. Higher-order motion modeling, such as cubic motion modeling, achieves predictions of intermediate optical flow between multiple interpolated frames, assisted by relaxation of the constraints imposed by the loss function used in initial optical flow estimation. A temporal pyramidal optical flow refinement module performs coarse-to-fine refinement of the optical flow maps used to generate the intermediate frames, focusing a proportionally greater amount of refinement attention to the optical flow maps for the high-error middle frames. A temporal pyramidal pixel refinement module performs coarse-to-fine refinement of the generated intermediate frames, focusing a proportionally greater amount of refinement attention to the high-error middle frames. A generative adversarial network (GAN) module calculates a loss function for training the neural networks used in the optical flow estimation module, temporal pyramidal optical flow refinement module, and/or temporal pyramidal pixel refinement module.
摘要:
Systems and methods for generating a visual vocabulary build a plurality of visual words via unsupervised learning on set of features of a given type; decompose one or more visual words to a collection of lower-dimensional buckets; generate labeled image representations based on the collection of lower dimensional buckets and labeled images, wherein labels associated with an image are associated with a respective representation of the image; and iteratively select a sub-collection of buckets from the collection of lower-dimensional buckets based on the labeled image representations, wherein bucket selection during any iteration after an initial iteration is based at least in part on feedback from previously selected buckets.
摘要:
The present invention is directed to a computer automated method of selectively identifying a user specified behavior of a crowd. The method comprises receiving video data but can also include audio data and sensor data. The video data contains images a crowd. The video data is processed to extract hierarchical human and crowd features. The detected crowd features are processed to detect a selectable crowd behavior. The selected crowd behavior detected is specified by a configurable behavior rule. Human detection is provided by a hybrid human detector algorithm which can include Adaboost or convolutional neural network. Crowd features are detected using textual analysis techniques. The configurable crowd behavior for detection can be defined by crowd behavioral language.
摘要:
Converting a digital image from color to gray-scale. In one example embodiment, a method for converting a digital image from color to gray-scale is disclosed. First, an unconverted pixel having red, green, and blue color channels is selected from the color digital image. Next, the red color channel of the pixel is multiplied by α. Then, the green color channel of the pixel is multiplied by β. Next, the blue color channel of the pixel is multiplied by γ. Then, the results of the three multiplication operations are added together to arrive at a gray-scale value for the pixel. Finally, these acts are repeated for each remaining unconverted pixel of the color digital image to arrive at a gray-scale digital image. In this example method, α+β+≈1 and α>β.
摘要:
Systems and methods for generating a visual vocabulary build a plurality of visual words via unsupervised learning on set of features of a given type; decompose one or more visual words to a collection of lower-dimensional buckets; generate labeled image representations based on the collection of lower dimensional buckets and labeled images, wherein labels associated with an image are associated with a respective representation of the image; and iteratively select a sub-collection of buckets from the collection of lower-dimensional buckets based on the labeled image representations, wherein bucket selection during any iteration after an initial iteration is based at least in part on feedback from previously selected buckets.
摘要:
Systems and methods for clustering descriptors in a space of visual descriptors to generate augmented visual descriptors in an augmented space that includes semantic information, wherein the augmented space of the augmented descriptors includes both visual descriptor-to-descriptor dissimilarities and semantic label-to-label dissimilarities; and cluster the augmented visual descriptors in the augmented space based at least in part on a dissimilarity measure between augmented visual descriptors in the augmented descriptor space.
摘要:
Systems and methods for summarizing a video assign frames in a video to at least one of two or more groups based on a topic, generate a respective first similitude measurement for the frames in a group relative to the other frames in the group based on a feature, rank the frames in a group relative to one or more other frames in the group based on the respective first similitude measurement of the respective frames, and select a frame from each group as a most-representative frame based on the respective rank of the frames in a group relative to the other frames in the group.
摘要:
A method and system for scanning a digital image for detecting the representation of an object, such as a face, and for reducing memory requirements of the computer system performing the image scan. One example method includes identifying an original image and downsamples the original image in an x-dimension and in a y-dimension to obtain a downsampled image that requires less storage space than the original digital image. A first scan is performed of the downsampled image to detect the representation of an object within the downsampled image. Then, the original digital image is divided into at least two image blocks, where each image block contains a portion of the original digital image. A second scan is then performed of each of the image blocks to detect the representation of the object within the image blocks.
摘要:
Methods for estimating a point spread function of a blurred digital image. One example method includes capturing gyro data during an image exposure time, deriving gyro samples from the gyro data at predetermined gyro sampling times, calculating a motion vector field of the image at each gyro sampling time, approximating an overall image scene motion path by averaging motion paths of selected pixels in the image, and estimating the point spread function from the approximated overall image scene motion path.