Abstract:
Disclosed is a method and system for efficiently and accurately tracking three-dimensional (3D) human motion from a two-dimensional (2D) video sequence, even when self-occlusion, motion blur and large limb movements occur. In an offline learning stage, 3D motion capture data is acquired and a prediction model is generated based on the learned motions. A mixture of factor analyzers acts as local dimensionality reducers. Clusters of factor analyzers formed within a globally coordinated low-dimensional space makes it possible to perform multiple hypothesis tracking based on the distribution modes. In the online tracking stage, 3D tracking is performed without requiring any special equipment, clothing, or markers. Instead, motion is tracked in the dimensionality reduced state based on a monocular video sequence.
Abstract:
Disclosed is a method and system for efficiently and accurately tracking three-dimensional (3D) human motion from a two-dimensional (2D) video sequence, even when self-occlusion, motion blur and large limb movements occur. In an offline learning stage, 3D motion capture data is acquired and a prediction model is generated based on the learned motions. A mixture of factor analyzers acts as local dimensionality reducers. Clusters of factor analyzers formed within a globally coordinated low-dimensional space makes it possible to perform multiple hypothesis tracking based on the distribution modes. In the online tracking stage, 3D tracking is performed without requiring any special equipment, clothing, or markers. Instead, motion is tracked in the dimensionality reduced state based on a monocular video sequence.
Abstract:
The present invention meets these needs by providing temporal coherency to recognition systems. One embodiment of the present invention comprises a manifold recognition module to use a sequence of images for recognition. A manifold training module receives a plurality of training image sequences (e.g. from a video camera), each training image sequence including an individual in a plurality of poses, and establishes relationships between the images of a training image sequence. A probabilistic identity module receives a sequence of recognition images including a target individual for recognition, and identifies the target individual based on the relationship of training images corresponding to the recognition images. An occlusion module masks occluded portions of an individual's face to prevent distorted identifications.
Abstract:
Methods and systems are described for three-dimensional pose estimation. A training module determines a mapping function between a training image sequence and pose representations of a subject in the training image sequence. The training image sequence is represented by a set of appearance and motion patches. A set of filters are applied to the appearance and motion patches to extract features of the training images. Based on the extracted features, the training module learns a multidimensional mapping function that maps the motion and appearance patches to the pose representations of the subject. A testing module outputs a fast human pose estimation by applying the learned mapping function to a test image sequence.
Abstract:
A system and method recognizes and tracks human motion from different motion classes. In a learning stage, a discriminative model is learned to project motion data from a high dimensional space to a low dimensional space while enforcing discriminance between motions of different motion classes in the low dimensional space. Additionally, low dimensional data may be clustered into motion segments and motion dynamics learned for each motion segment. In a tracking stage, a representation of human motion is received comprising at least one class of motion. The tracker recognizes and tracks the motion based on the learned discriminative model and the learned dynamics.
Abstract:
Taking a set of unlabeled images of a collection of objects acquired under different imaging conditions, and decomposing the set into disjoint subsets corresponding to individual objects requires clustering. Appearance-based methods for clustering a set of images of 3-D objects acquired under varying illumination conditions can be based on the concept of illumination cones. A clustering problem is equivalent to finding convex polyhedral cones in the high-dimensional image space. To efficiently determine the conic structures hidden in the image data, the concept of conic affinity can be used which measures the likelihood of a pair of images belonging to the same underlying polyhedral cone. Other algorithms can be based on affinity measure based on image gradient comparisons operating directly on the image gradients by comparing the magnitudes and orientations of the image gradient.
Abstract:
A system and a method are disclosed for an adaptive discriminative generative model with a probabilistic interpretation. As applied to visual tracking, the discriminative generative model separates the target object from the background more accurately and efficiently than conventional methods. A computationally efficient algorithm constantly updates the discriminative model over time. The discriminative generative model adapts to accommodate dynamic appearance variations of the target and background. Experiments show that the discriminative generative model effectively tracks target objects undergoing large pose and lighting changes.
Abstract:
The advantage of the present invention is to appropriately detect the object. The object detection apparatus in the present invention has a plurality of cameras to determine the distance to the objects, a distance determination unit to determine the distance therein, a histogram generation unit to specify the frequency of the pixels against the distances to the pixels, an object distance determination unit that determines the most likely distance, a probability mapping unit that provides the probabilities of the pixels based on the difference of the distance, a kernel detection unit that determines a kernel region as a group of the pixels, a periphery detection unit that determines a peripheral region as a group of the pixels, selected from the pixels being close to the kernel region and an object specifying unit that specifies the object region where the object is present with a predetermined probability.
Abstract:
A system and a method are disclosed for clustering images of objects seen from different viewpoints. That is, given an unlabelled set of images of n objects, an unsupervised algorithm groups the images into N disjoint subsets such that each subset only contains images of a single object. The clustering method makes use of a broad geometric framework that exploits the interplay between the geometry of appearance manifolds and the symmetry of the 2D affine group.
Abstract:
The present invention meets these needs by providing temporal coherency to recognition systems. One embodiment of the present invention comprises a manifold recognition module to use a sequence of images for recognition. A manifold training module receives a plurality of training image sequences (e.g. from a video camera), each training image sequence including an individual in a plurality of poses, and establishes relationships between the images of a training image sequence. A probabilistic identity module receives a sequence of recognition images including a target individual for recognition, and identifies the target individual based on the relationship of training images corresponding to the recognition images. An occlusion module masks occluded portions of an individual's face to prevent distorted identifications.