Abstract:
A method and system for identifying content relevance comprises acquiring video data, mapping the acquired video data to a feature space to obtain a feature representation of the video data, assigning the acquired video data to at least one action class based on the feature representation of the video data, and determining a relevance of the acquired video data.
Abstract:
A method, non-transitory computer readable medium, and apparatus for localizing a region of interest using a hand gesture are disclosed. For example, the method acquires an image containing the hand gesture from the ego-centric video, detects pixels that correspond to one or more hands in the image using a hand segmentation algorithm, identifies a hand enclosure in the pixels that are detected within the image, localizes a region of interest based on the hand enclosure and performs an action based on the object in the region of interest.
Abstract:
A mobile electronic device processes a sequence of images to identify and re-identify an object of interest in the sequence. An image sensor of the device, receives a sequence of images. The device detects an object in a first image as well as positional parameters of the device that correspond to the object in the first image. The device determines a range of positional parameters within which the object may appear in a field of view of the device. When the device detects that the object of interest exited the field of view it subsequently uses motion sensor data to determine that the object of interest has likely re-entered the field of view, it will analyze the current frame to confirm that the object of interest has re-entered the field of view.
Abstract:
A method, non-transitory computer-readable medium, and apparatus for localizing a region of interest using a dynamic hand gesture are disclosed. For example, the method captures the ego-centric video containing the dynamic hand gesture, analyzes a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm, analyzes temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, localizes the region of interest based on the path of the fingertip in the dynamic hand gesture and performs an action based on an object in the region of interest.
Abstract:
A system captures or otherwise receives a video and uses the video to create an electronic file corresponding to a multi-faceted printed artifact, such as a multi-page document. When the system receives the video, it selects a set of some or all of the video's image frames, determines a frame quality for each frame in the set, and identifies a subset of the frames such that the frame quality of each frame in the subset satisfies one or more image quality criteria. The subset will include at least one frame for each facet of the multi-faceted printed artifact, such as a page of the document. The processor then automatically combines the subset of frames into a single electronic file.
Abstract:
A system captures or otherwise receives a video and uses the video to create an electronic file corresponding to a multi-faceted printed artifact, such as a multi-page document. When the system receives the video, it selects a set of some or all of the video's image frames, determines a frame quality for each frame in the set, and identifies a subset of the frames such that the frame quality of each frame in the subset satisfies one or more image quality criteria. The subset will include at least one frame for each facet of the multi-faceted printed artifact, such as a page of the document. The processor then automatically combines the subset of frames into a single electronic file.
Abstract:
A mobile electronic device processes a sequence of images to identify and re-identify an object of interest in the sequence. An image sensor of the device, receives a sequence of images. The device detects an object in a first image as well as positional parameters of the device that correspond to the object in the first image. The device determines a range of positional parameters within which the object may appear in a field of view of the device. When the device detects that the object of interest exited the field of view it subsequently uses motion sensor data to determine that the object of interest has likely re-entered the field of view, it will analyze the current frame to confirm that the object of interest has re-entered the field of view.
Abstract:
A method, computer readable medium and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual are disclosed. The method includes receiving a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receiving the query video, calculating a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video.
Abstract:
The embodiments include systems and methods for guiding a user to capture two flash images of a document page, and selectively fuse the images to produce a binary image of high quality and without loss of any content. Each individual image may have an FSR where the content is degraded/lost due to the flash light. The idea is to first guide the user to take two images such that there is no overlap of flash-spots in the document regions. The flash spots in both images are detected and assessed for quality and extent of degradation in both images. The image with lower degradation is chosen as the primary image and the other image as secondary, to minimize fusing artifacts. The region in secondary image corresponding to the FSR in the primary is aligned to the primary region using a multiscale alignment technique. The primary image and aligned FSR are binarized and fused in the vicinity of the flashspot in the primary using an intelligent technique that minimizes fusion boundary artifacts such as cutting of characters and words.
Abstract:
A method, non-transitory computer readable medium, and apparatus for training hand detection in an ego-centric video are disclosed. For example, the method prompts a user to provide a hand gesture, captures the ego-centric video containing the hand gesture, analyzes the hand gesture in a frame of the ego-centric video to identify a set of pixels in the image corresponding to a hand region, generates a training set of features from the set of pixels that correspond to the hand region and trains a head-mounted video device to detect the hand in subsequently captured ego-centric video images based on the training set of features.