摘要:
An online sparse matrix Gaussian process (OSMGP) uses online updates to provide an accurate and efficient regression for applications such as pose estimation and object tracking. A regression calculation module calculates a regression on a sequence of input images to generate output predictions based on a learned regression model. The regression model is efficiently updated by representing a covariance matrix of the regression model using a sparse matrix factor (e.g., a Cholesky factor). The sparse matrix factor is maintained and updated in real-time based on the output predictions. Hyperparameter optimization, variable reordering, and matrix downdating techniques can also be applied to further improve the accuracy and/or efficiency of the regression process.
摘要:
The face detection system and method attempts classification of a test image before performing all of the kernel evaluations. Many subimages are not faces and should be relatively easy to identify as such. Thus, the SVM classifier try to discard non-face images using as few kernel evaluations as possible using a cascade SVM classification. In the first stage, a score is computed for the first two support vectors, and the score is compared to a threshold. If the score is below the threshold value, the subimage is classified as not a face. If the score is above the threshold value, the cascade SVM classification function continues to apply more complicated decision rules, each time doubling the number of kernel evaluations, classifying the image as a non-face (and thus terminating the process) as soon as the test image fails to satisfy one of the decision rules. Finally, if the subimage has satisfied all intermediary decision rules, and has now reached the point at which all support vectors must be considered, the original decision function is applied. Satisfying this final rule, and all intermediary rules, is the only way for a test image to garner a positive (face) classification.
摘要:
A system and a method model the motion of a non-rigid object using a thin plate spline (TPS) transform. A first image of a video sequence is received, and a region of interest, referred to as a template, is chosen manually or automatically. A set of arbitrarily-chosen fixed reference points is positioned on the template. A target image of the video sequence is chosen for motion estimation relative to the template. A set of pixels in the target image corresponding to the pixels of the template is determined, and this set of pixels is back-warped to match the template using a thin-plate-spline-based technique. The error between the template and the back-warped image is determined and iteratively minimized using a gradient descent technique. The TPS parameters can then be used to estimate the relative motion between the template and the corresponding region of the target image. According to one embodiment, a stiff-to-flexible approach mitigates instability that can arise when reference points lie in textureless regions, or when the initial TPS parameters are not close to the desired ones. The value of a regularization parameter is varied from a larger to a smaller value, varying the nature of the warp from stiff to flexible, so as to progressively emphasize local non-rigid deformations.
摘要:
A statistical formulation estimates two-dimensional human pose from single images. This is based on a Markov network and on inferring pose parameters from cues such as appearance, shape, edge, and color. A data-driven belief propagation Monte Carlo algorithm performs efficient Bayesian inferencing within a rigorous statistical framework. Experimental results demonstrate the effectiveness of the method in estimating human pose from single images.
摘要:
Visual tracking over a sequence of images is formulated by defining an object class and one or more background classes. The most discriminant features available in the images are then used to select a portion of each image as belonging to the object class. Fisher's linear discriminant method is used to project high-dimensional image data onto a lower-dimensional space, e.g., a line, and perform classification in the lower-dimensional space. The projection function is incrementally updated.
摘要:
Simultaneous localization and mapping (SLAM) utilizes multiple view feature descriptors to robustly determine location despite appearance changes that would stifle conventional systems. A SLAM algorithm generates a feature descriptor for a scene from different perspectives using kernel principal component analysis (KPCA). When the SLAM module subsequently receives a recognition image after a wide baseline change, it can refer to correspondences from the feature descriptor to continue map building and/or determine location. Appearance variations can result from, for example, a change in illumination, partial occlusion, a change in scale, a change in orientation, change in distance, warping, and the like. After an appearance variation, a structure-from-motion module uses feature descriptors to reorient itself and continue map building using an extended Kalman Filter. Through the use of a database of comprehensive feature descriptors, the SLAM module is also able to refine a position estimation despite appearance variations.