摘要:
The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.
摘要:
The present invention is embodied in a system and method for estimating epipolar geometry, in terms of a fundamental matrix, between multiple images of an object for stereo vision processing. The fundamental matrix embodies the epipolar geometry between the images. In general, the present invention includes a method for estimating epipolar geometry between multiple images of an original space given an initial estimate of the fundamental matrix found using a standard linear estimation technique. Namely, the system and method of the present invention estimates the fundamental matrix by transforming image points of multiple images into projective space. After this transformation is performed, nonlinear optimization is used with one parameterization of the fundamental matrix. The images are then inverse transformed back to the original space with a final estimate of the fundamental matrix. In addition, the original noise information is preserved during the optimization in the projective space.
摘要:
The present invention is embodied in a curve matching system and method that is guided by a set of matched corners. The corner guided curve matching produces a geometrical representation of the scene from the images, which can be used for any suitable application, such as computer and stereo vision applications. In general, first, multiple images depicting a scene are digitally received by the system. The images are graphical images digitally received and processed. For example, the images can be two dimensional image data, such as bitmap or raster image data. Curves of the images are then matched to correlate the two images of the scene for creating three dimensional (3D) curve information, such as 3D vector or mathematical information, of the scene. This 3D vector information can then be used in any suitable manner, for example, to digitally reconstruct the scene for stereo vision applications.
摘要:
The present invention is embodied in systems and methods for determining structure and motion of a three-dimensional (3D) object using two-dimensional (2D) images of the object obtained from multiple sets of views with different projection models, such as from a full perspective view and a weak perspective views. A novel fundamental matrix is derived that embodies the epipolar geometry between a full perspective view and a weak perspective view. The systems and methods of the present invention preferably uses the derived fundamental matrix together with the 2D image information of the full and weak perspective views to digitally reconstruct the 3D object and produce results with multi-resolution processing techniques. These techniques include recovering and refining motion parameters and recovering and refining structure parameters of the fundamental matrix. The results can include, for example, 3D positions of points, camera position between different views, texture maps, and the like.
摘要:
A depth construction module is described that receives depth images provided by two or more depth capture units. Each depth capture unit generates its depth image using a structured light technique, that is, by projecting a pattern onto an object and receiving a captured image in response thereto. The depth construction module then identifies at least one deficient portion in at least one depth image that has been received, which may be attributed to overlapping projected patterns that impinge the object. The depth construction module then uses a multi-view reconstruction technique, such as a plane sweeping technique, to supply depth information for the deficient portion. In another mode, a multi-view reconstruction technique can be used to produce an entire depth scene based on captured images received from the depth capture units, that is, without first identifying deficient portions in the depth images.
摘要:
A system described herein includes a receiver component that receives a first digital image from a color camera, wherein the first digital image comprises a planar object, and a second digital image from a depth sensor, wherein the second digital image comprises the planar object. The system also includes a calibrator component that jointly calibrates the color camera and the depth sensor based at least in part upon the first digital image and the second digital image.
摘要:
A three-dimensional shape parameter computation system and method for computing three-dimensional human head shape parameters from two-dimensional facial feature points. A series of images containing a user's face is captured. Embodiments of the system and method deduce the 3D parameters of the user's head by examining a series of captured images of the user over time and in a variety of head poses and facial expressions, and then computing an average. An energy function is constructed over a batch of frames containing 2D face feature points obtained from the captured images, and the energy function is minimized to solve for the head shape parameters valid for the batch of frames. Head pose parameters and facial expression and animation parameters can vary over each captured image in the batch of frames. In some embodiments this minimization is performed using a modified Gauss-Newton minimization technique using a single iteration.
摘要:
Described is a hierarchical filtered motion field technology such as for use in recognizing actions in videos with crowded backgrounds. Interest points are detected, e.g., as 2D Harris corners with recent motion, e.g. locations with high intensities in a motion history image (MHI). A global spatial motion smoothing filter is applied to the gradients of MHI to eliminate low intensity corners that are likely isolated, unreliable or noisy motions. At each remaining interest point, a local motion field filter is applied to the smoothed gradients by computing a structure proximity between sets of pixels in the local region and the interest point. The motion at a pixel/pixel set is enhanced or weakened based on its structure proximity with the interest point (nearer pixels are enhanced).
摘要:
A three-dimensional shape parameter computation system and method for computing three-dimensional human head shape parameters from two-dimensional facial feature points. A series of images containing a user's face is captured. Embodiments of the system and method deduce the 3D parameters of the user's head by examining a series of captured images of the user over time and in a variety of head poses and facial expressions, and then computing an average. An energy function is constructed over a batch of frames containing 2D face feature points obtained from the captured images, and the energy function is minimized to solve for the head shape parameters valid for the batch of frames. Head pose parameters and facial expression and animation parameters can vary over each captured image in the batch of frames. In some embodiments this minimization is performed using a modified Gauss-Newton minimization technique using a single iteration.
摘要:
A system that facilitates automatically determining an action of an animate object is described herein. The system includes a receiver component that receives video data that includes images of an animate object. The system additionally includes a determiner component that accesses a data store that includes an action graph and automatically determines an action undertaken by the animate object in the received video data based at least in part upon the action graph. The action graph comprises a plurality of nodes that are representative of multiple possible postures of the animate object. At least one node in the action graph is shared amongst multiple actions represented in the action graph.