Abstract:
A method and associated apparatus for using a trajectory-based technique to detect a moving object in a video sequence at incorporates human interaction through a user interface. The method comprises steps of identifying and evaluating sets of connected components in a video frame, filtering the list of connected components by comparing features of the connected components to predetermined criteria, identifying candidate trajectories across multiple frames, evaluating the candidate trajectories to determine a selected trajectory, eliminating incorrect trajectories through use of the interface and processing images in said video sequence responsive to the evaluating and eliminating steps.
Abstract:
One or more implementations access a digital image containing one or more bands. Adjacent bands of the one or more bands have a difference in color resulting in a contour between the adjacent bands. The one or more implementations apply an algorithm to at least a portion of the digital image for reducing visibility of a contour. The algorithm is based on a value representing the fraction of pixels in a region of the digital image having a particular color value.
Abstract:
In an implementation, a pixel is selected from a target digital image. Multiple candidate pixels, from one or more digital images, are evaluated based on values of the multiple candidate pixels. For the selected pixel, a corresponding set of pixels is determined from the multiple candidate pixels based on the evaluations of the multiple candidate pixels and on whether a predetermined threshold number of pixels have been included in the corresponding set. Further for the selected pixel, a substitute value is determined based on the values of the pixels in the corresponding set of pixels. Various implementations described provide adaptive pixel-based spatio-temporal filtering of images or video to reduce film grain or noise. Implementations may achieve an “even” amount of noise reduction at each pixel while preserving as much picture detail as possible by, for example, averaging each pixel with a constant number, N, of temporally and/or spatially correlated pixels.
Abstract:
A method of object-aware video coding is provided that comprises the steps of: receiving a video sequence having a plurality of frames; selecting at least two frames; determing total area of at least one object of interest in each of the at least two frames; comparing the total area to a threshold area; classifying each of the at least two frames as being a low object weighted frame or a high object weighted frame, low object weighted frames being frames having the total area exceeding the threshold area and high object weighted frames being frame having the total area not exceeding the threshold area; and encoding each low object weighted frame according to one encoding mode and encoding each high object weighted frame according to a different encoding mode.
Abstract:
Several implementations relate to view synthesis with heuristic view merging for 3D Video (3DV) applications. According to one aspect, a first candidate pixel from a first warped reference view and a second candidate pixel from a second warped reference view are assessed based on at least one of a backward synthesis process to assess a quality of the first and second candidate pixels, a hole distribution around the first and second candidate pixels, or on an amount of energy around the first and second candidate pixels above a specified frequency. The assessing occurs as part of merging at least the first and second warped reference views into a signal synthesized view. Based on the assessing, a result is determined for a given target pixel in the single synthesized view. The result may be determining a value for the given target pixel, or marking the given target pixel as a hole.
Abstract:
The visibility of an object in a digital picture is enhanced by comparing an input video of the digital picture with stored information representative of the nature and characteristics of the object to develop object localization information that identifies and locates the object. The visibility of the object and the region in which the object is located is enhanced by image processing and the enhanced input video is encoded.
Abstract:
Methods and apparatus are provided for sampling-based super resolution video encoding and decoding. The encoding method receives high resolution pictures and generates low resolution pictures and metadata there from, the metadata for guiding post-decoding post-processing of the low resolution pictures and the metadata; and then encodes the low resolution pictures and the metadata using at least one encoder. The corresponding decoding method receives a bitstream and decodes low resolution pictures and metadata there from using a decoder; and then reconstructs high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata.
Abstract:
Methods and apparatus are provided for reducing vector quantization error through patch shifting. A method generates, from an input video sequence, one of more high resolution replacement patches, the one or more high resolution replacement patches for replacing one or more low resolution patches during a reconstruction of the input video sequence. This generating step generates the one or more high resolution replacement patches using data corresponding to a patch spatial shifting process, the patch spatial shifting process for reducing jittery artifacts caused by a motion-induced vector quantization error in the one or more high resolution replacement patches, the data for at least deriving a patch size of the one or more high resolution replacement patches such that the one or more high resolution replacement patches are generated to have the patch size greater than a patch size of the one or more low resolution patches in order to be suitable for use in the patch spatial shifting process.
Abstract:
One or more implementations access a digital image and determine whether at least one portion of the digital image includes one or more bands having a difference in color. The determination is based on at least two candidate scales. One or more implementations access a digital image and assess at least a portion of the digital image for the existence of one or more bands having a difference in color. The assessing includes determining a fraction of pixels in the portion having a color value offset by an offset value from a color value of a particular pixel in the portion.
Abstract:
A method is disclosed for detecting and locating players in soccer video frames without errors caused by artifacts by a shape analysis-based approach to identify the players and the ball from roughly extracted foregrounds obtained by color segmentation and connected component analysis, by performing a Euclidean distance transform to extract skeletons for every foreground blob, by performing a shape analysis to remove false alarms (non-players and non-ball), and then by performing skeleton pruning and a reverse Euclidean distance transform to cut-off the artifacts primarily caused by playing field lines.