摘要:
A method and system for identifying content relevance comprises acquiring video data, mapping the acquired video data to a feature space to obtain a feature representation of the video data, assigning the acquired video data to at least one action class based on the feature representation of the video data, and determining a relevance of the acquired video data.
摘要:
A method, non-transitory computer readable medium, and apparatus for localizing a region of interest using a hand gesture are disclosed. For example, the method acquires an image containing the hand gesture from the ego-centric video, detects pixels that correspond to one or more hands in the image using a hand segmentation algorithm, identifies a hand enclosure in the pixels that are detected within the image, localizes a region of interest based on the hand enclosure and performs an action based on the object in the region of interest.
摘要:
Methods and systems for automatically synchronizing videos acquired via two or more cameras with overlapping views in a multi-camera network. Reference lines within an overlapping field of view of the two (or more) cameras in the multi-camera network can be determined wherein the reference lines connect two or more pairs of corresponding points. Spatiotemporal maps of the reference lines can then be obtained. An optimal alignment between video segments obtained from the cameras is then determined based on the registration of the spatiotemporal maps.
摘要:
A mobile electronic device processes a sequence of images to identify and re-identify an object of interest in the sequence. An image sensor of the device, receives a sequence of images. The device detects an object in a first image as well as positional parameters of the device that correspond to the object in the first image. The device determines a range of positional parameters within which the object may appear in a field of view of the device. When the device detects that the object of interest exited the field of view it subsequently uses motion sensor data to determine that the object of interest has likely re-entered the field of view, it will analyze the current frame to confirm that the object of interest has re-entered the field of view.
摘要:
A method, non-transitory computer-readable medium, and apparatus for localizing a region of interest using a dynamic hand gesture are disclosed. For example, the method captures the ego-centric video containing the dynamic hand gesture, analyzes a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm, analyzes temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, localizes the region of interest based on the path of the fingertip in the dynamic hand gesture and performs an action based on an object in the region of interest.
摘要:
Block-based motion estimation of video compression estimates the direction and magnitude of motion of objects in the scene in a computationally efficient manner and accurately predicts the optimal search direction/neighborhood location for motion vectors, A system can Include a motion detection module that detects apparent motion in the scene, a motion direction and magnitude prediction module that estimates the direction and magnitude of motion of the objects detected to he in motion by the motion detection module, and a block-based, motion estimation module that performs searches in reduced neighborhoods of the target block according to the estimated motion by the motion direction and magnitude prediction module and only for the blocks determined to be in motion by the motion detection module. The Invention is particularly well suited for stationary traffic cameras that monitor roads and highways for traffic law enforcement purposes.
摘要:
A method, non-transitory computer readable medium, and apparatus for localizing a region of interest using a hand gesture are disclosed. For example, the method acquires an image containing the hand gesture from the ego-centric video, detects pixels that correspond to one or more hands in the image using a hand segmentation algorithm, identifies a hand enclosure in the pixels that are detected within the image, localizes a region of interest based on the hand enclosure and performs an action based on the object in the region of interest.
摘要:
A method for removing false foreground image content in a foreground detection process performed on a video sequence includes, for each current frame, comparing a feature value of each current pixel against a feature value of a corresponding pixel in a background model. The each current pixel is classified as belonging to one of a candidate foreground image and a background based on the comparing. A first classification image representing the candidate foreground image is generated using the current pixels classified as belonging to the candidate foreground image. The each pixel in the first classification image is classified as belonging to one of a foreground image and a false foreground image using a previously trained classifier. A modified classification image is generated for representing the foreground image using the pixels classified as belonging to the foreground image while the pixels classified as belonging to the false foreground image are removed.
摘要:
Methods and systems obtain data representative of a scene across spectral bands using a compressive-sensing-based hyperspectral imaging system comprising optical elements. These methods and systems sample two modes of a three-dimensional tensor corresponding to a hyperspectral representation of the scene using sampling matrices, one for each of the two modes, to generate a modified three-dimensional tensor. After sampling the two modes, such methods and systems sample a third mode of the modified three-dimensional tensor using a third sampling matrix to generate a further modified three-dimensional tensor. Then, the methods and systems reconstruct hyperspectral data from the further modified three-dimensional tensor using the sampling matrices and the third sampling matrix.
摘要:
A camera outputs video as a sequence of video frames having pixel values in a first (e.g., relatively low dimensional) color space, where the first color space has a first number of channels. An image-processing device maps the video frames to a second (e.g., relatively higher dimensional) color representation of video frames. The mapping causes the second color representation of video frames to have a greater number of channels relative to the first number of channels. The image-processing device extracts a second color representation of a background frame of the scene. The image-processing device can then detect foreground objects in a current frame of the second color representation of video frames by comparing the current frame with the second color representation of a background frame. The image-processing device then outputs an identification of the foreground objects in the current frame of the video.