Abstract:
A system and method are provided. The system includes an image capture device configured to capture a video sequence formed from a set of input image frames and including a set of objects. The system further includes a processor configured to detect the objects to form object detections and track the object detections over the input frames to form tracked detections. The processor is also configured to generate for a current input frame, responsive to conditions, a set of sparse object proposals for a current location of an object based on: (i) the tracked detections of the object from an immediately previous input frame; and (ii) detection proposals for the object derived from the current frame. The processor is additionally configured to provide a user perceptible indication of the current location of the object, based on the set of sparse object proposals.
Abstract:
A computer-implemented method, system, and computer program product is provided for pose-invariant facial recognition. The method includes generating, by a processor using a recognition neural network, a rich feature embedding for identity information and non-identity information for each of one or more images. The method also includes generating, by the processor using a Siamese reconstruction network, one or more pose-invariant features by employing the rich feature embedding for identity information and non-identity information. The method additionally includes identifying, by the processor, a user by employing the one or more pose-invariant features. The method further includes controlling an operation of a processor-based machine to change a state of the processor-based machine, responsive to the identified user in the one or more images.
Abstract:
A login access control system is provided. The login access control system includes a camera configured to capture an input image of a subject purported to be a person and attempting to login to a system to access secure data. The login access control system further includes a memory storing a deep learning model configured to perform multi-task learning for a pair of tasks including a liveness detection task and a face recognition task. The login access control system also includes a processor configured to apply the deep learning model to the input image to recognize an identity of the subject in the input image regarding being authorized for access to the secure data and a liveness of the subject. The liveness detection task is configured to evaluate a plurality of different distractor modalities corresponding to different physical spoofing materials to prevent face spoofing for the face recognition task.
Abstract:
A face recognition system and corresponding method are provided. The face recognition system includes a camera configured to capture an input image of a subject purported to be a person. The face recognition system further includes a memory storing a deep learning model configured to perform multi-task learning for a pair of tasks including a liveness detection task and a face recognition task. The face recognition system also includes a processor configured to apply the deep learning model to the input image to recognize an identity of the subject in the input image and a liveness of the subject. The liveness detection task is configured to evaluate a plurality of different distractor modalities corresponding to different physical spoofing materials to prevent face spoofing for the face recognition task.
Abstract:
A machine access control system and corresponding method are provided. The machine access control system includes a camera configured to capture an input image of a subject purported to be a person associated with operating a particular workplace machine. The machine access control system further includes a memory storing a deep learning model configured to perform multi-task learning for a pair of tasks including a liveness detection task and a face recognition task. The machine access control system also includes a processor configured to apply the deep learning model to the input image to recognize an identity of the subject in the input image regarding being authorized to use the particular workplace machine and a liveness of the subject. The liveness detection task is configured to evaluate a plurality of different distractor modalities corresponding to different physical spoofing materials to prevent face spoofing for the face recognition task.
Abstract:
A method to perform hiearchical video segmentation includes: defining voxels over a spatio-temporal video; grouping into segments contiguous voxels that display similar characteristics including similar appearance or motion; determining a trajectory-based feature that complements color and optical flow cues, wherein trajectory cues are probabilistically meaningful histograms combinable for use in a graph-based framework; and applying a max-margin module for cue combination that learns a supervised distance metric for region dissimilarity that combines color, flow and trajectory features.
Abstract:
Systems and methods are disclosed to provide an Advanced Warning System (AWS) for a driver of a vehicle, by capturing traffic scene types from a single camera video; generating real-time monocular SFM and 2D object detection from the single camera video; detecting a ground plane from the real-time monocular SFM and the 2D object detection; performing dense 3D estimation from the real-time monocular SFM and the 2D object detection; generating a joint 3D object localization from the ground plane and dense 3D estimation; and communicating a situation that requires caution to the driver.
Abstract:
Systems and methods are disclosed for determining three dimensional (3D) shape by capturing with a camera a plurality of images of an object in differential motion; derive a general relation that relates spatial and temporal image derivatives to BRDF derivatives; exploiting rank deficiency to eliminate BRDF terms and recover depth or normal for directional lighting; and using depth-normal-BRDF relation to recover depth or normal for unknown arbitrary lightings.
Abstract:
A method for performing three-dimensional (3D) localization requiring only a single camera including capturing images from only one camera; generating a cue combination from sparse features, dense stereo and object bounding boxes; correcting for scale in monocular structure from motion (SFM) using the cue combination for estimating a ground plane; and performing localization by combining SFM, ground plane and object bounding boxes to produce a 3D object localization.
Abstract:
A method to perform hiearchical video segmentation includes: defining voxels over a spatio-temporal video; grouping into segments contiguous voxels that display similar characteristics including similar appearance or motion; determining a trajectory-based feature that complements color and optical flow cues, wherein trajectory cues are probabilistically meaningful histograms combinable for use in a graph-based framework; and applying a max-margin module for cue combination that learns a supervised distance metric for region dissimilarity that combines color, flow and trajectory features.