Abstract:
A mobile electronic device processes a sequence of images to identify and re-identify an object of interest in the sequence. An image sensor of the device, receives a sequence of images. The device detects an object in a first image as well as positional parameters of the device that correspond to the object in the first image. The device determines a range of positional parameters within which the object may appear in a field of view of the device. When the device detects that the object of interest exited the field of view it subsequently uses motion sensor data to determine that the object of interest has likely re-entered the field of view, it will analyze the current frame to confirm that the object of interest has re-entered the field of view.
Abstract:
A method and system for reconstructing an image of a scene comprises configuring a digital light modulator according to a spatially varying pattern. Light energy associated with the scene and incident on the spatially varying pattern is collected and optically focused on the photodetectors. Data indicative of the intensity of the focused light energy from each of said at least two photodetectors is collected. Data from the photodetectors is then combined to reconstruct an image of the scene.
Abstract:
A method, computer readable medium and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual are disclosed. The method includes receiving a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receiving the query video, calculating a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video.
Abstract:
Methods and systems for automatically synchronizing videos acquired via two or more cameras with overlapping views in a multi-camera network. Reference lines within an overlapping field of view of the two (or more) cameras in the multi-camera network can be determined wherein the reference lines connect two or more pairs of corresponding points. Spatiotemporal maps of the reference lines can then be obtained. An optimal alignment between video segments obtained from the cameras is then determined based on the registration of the spatiotemporal maps.
Abstract:
A method for training a vehicle detection system used in a street occupancy estimation of stationary vehicles. The method includes defining first and second areas on an image plane of an image capture device associated with monitoring for detection of vehicles. The method includes receiving video-data from a sequence of frames captured from the image capture device. The method includes determining candidate frames that include objects relevant to a classification task in the second area. The method includes extracting the objects from the candidate frames, extracting features of each extracted object, and assigning labels to the each extracted object. The method includes training at least one classifier using the labels and extracted features. The method includes using the at least one trained classifier to classify a stationary vehicle detected in the first area.
Abstract:
A method, non-transitory computer readable medium, and apparatus for training hand detection in an ego-centric video are disclosed. For example, the method prompts a user to provide a hand gesture, captures the ego-centric video containing the hand gesture, analyzes the hand gesture in a frame of the ego-centric video to identify a set of pixels in the image corresponding to a hand region, generates a training set of features from the set of pixels that correspond to the hand region and trains a head-mounted video device to detect the hand in subsequently captured ego-centric video images based on the training set of features.
Abstract:
A method, non-transitory computer-readable medium, and apparatus for localizing a region of interest using a dynamic hand gesture are disclosed. For example, the method captures the ego-centric video containing the dynamic hand gesture, analyzes a frame of the ego-centric video to detect pixels that correspond to a fingertip using a hand segmentation algorithm, analyzes temporally one or more frames of the ego-centric video to compute a path of the fingertip in the dynamic hand gesture, localizes the region of interest based on the path of the fingertip in the dynamic hand gesture and performs an action based on an object in the region of interest.
Abstract:
A method and system for video-based object tracking includes detecting an initial instance of an object of interest in video captured of a scene being monitored and establishing a representation of a target object from the initial instance of the object. The dominant motion trajectory characteristic of the target object are then determined and a frame-by-frame location of the target object can be collected in order to track the target object in the video.
Abstract:
A system and method of monitoring a customer space including obtaining visual data comprising image frames of the customer space over a period of time, defining a region of interest within the customer space, the region of interest corresponding to a portion of the customer space in which customers relocate objects, monitoring the region of interest for at least one predefined clutter condition, and generating a notification when the at least one predefined clutter condition is detected.
Abstract:
A computer vision system (100) operates to monitor an environment (e.g., such as a restaurant, store or other retail establishment) including a resource located therein (e.g., such as a restroom, a dining table, a drink, condiment or supply dispenser, a trash receptacle or a tray collection rack). The system includes: an image source or camera (104) that supplies image data (130) representative of at least a portion of the environment monitored by the system, the portion including the resource therein; and an event detection device (102) including a data processor (112) and operative to detect an event involving the resource. Suitably, the event detection device is arranged to: (i) be selectively configurable by a user to define the event involving the resource; (ii) receive the image data supplied by the image source; (iii) analyze the received image data to detect the defined event; and (iv) output a notification in response to detecting the defined event.