摘要:
Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services.
摘要:
A system facilitates managing one or more devices utilized for communicating data within a telepresence session. A telepresence session can be initiated within a communication framework that includes a first user and one or more second users. In response to determining a temporary absence of the first user from the telepresence session, a recordation of the telepresence session is initialized to enable a playback of a portion or a summary of the telepresence session that the first user has missed.
摘要:
Techniques and technologies for tracking a face with a plurality of cameras wherein a geometry between the cameras is initially unknown. One disclosed method includes detecting a head with two of the cameras and registering a head model with the image of the head (as detected by one of the cameras). The method also includes back projecting the other detected face image to the head model and determining a head pose from the back-projected head image. Furthermore, the determined geometry is used to track the face with at least one of the cameras.
摘要:
The computer-readable media provides improved procedures to estimate head motion between two images of a face. Locations of a number of distinct facial features are determined in two images. The locations are converted into as a set of physical face parameters based on the symmetry of the identified distinct facial features. An estimation objective function is determined by: (a) estimating each of the set of physical parameters, (b) estimating a first head pose transform corresponding to the first image, and (c) estimating a second head pose transform corresponding to the second image. The motion is estimated between the two images based on the set of physical face parameters by multiplying each term of the estimation objective function by a weighted contribution factor based on the confidence of data corresponding to the estimation objective function.
摘要:
A method and apparatus determine a channel response for an alternative sensor using an alternative sensor signal and an air conduction microphone signal. The channel response is then used to estimate a clean speech value using at least a portion of the alternative sensor signal.
摘要:
Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services.
摘要:
A real-time approximately 360 degree image correction system and a method for alleviating distortion and perception problems in images captured by omni-directional cameras. In general, the real-time panoramic image correction method generates a warp table from pixel coordinates of a panoramic image and applies the warp table to the panoramic image to create a corrected panoramic image. The corrections are performed using a parametric class of warping functions that include Spatially Varying Uniform (SVU) scaling functions. The SVU scaling functions and scaling factors are used to perform vertical scaling and horizontal scaling on the panoramic image pixel coordinates. A horizontal distortion correction is performed using the SVU scaling functions at at least two different scaling factors. This processing generates a warp table that can be applied to the panoramic image to yield the corrected panoramic image. In one embodiment the warp table is concatenated with a stitching table used to create the panoramic image.
摘要:
Described herein is a technique for creating a 3D face model using images obtained from an inexpensive camera associated with a general-purpose computer. Two still images of the user are captured, and two video sequences. The user is asked to identify five facial features, which are used to calculate a mask and to perform fitting operations. Based on a comparison of the still images, deformation vectors are applied to a neutral face model to create the 3D model. The video sequences are used to create a texture map. The process of creating the texture map references the previously obtained 3D model to determine poses of the sequential video images.
摘要:
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.
摘要:
The system provides improved procedures to estimate head motion between two images of a face. Locations of a number of distinct facial features are identified in two images. The identified locations can correspond to the eye comers, mouth corners and nose tip. The locations are converted into as a set of physical face parameters based on the symmetry of the identified distinct facial features. The set of physical parameters reduces the number of unknowns as compared to the number of equations used to determine the unknowns. An initial head motion estimate is determined by: (a) estimating each of the set of physical parameters, (b) estimating a first head pose transform corresponding to the first image, and (c) estimating a second head pose transform corresponding to the second image. The head motion estimate can be incorporated into a feature matching algorithm to refine the head motion estimation and the physical facial parameters. In one implementation, an inequality constraint is placed on a particular physical parameter—such as a nose tip, such that the parameter is constrained within a predetermined minimum and maximum value. The inequality constraint is converted to an equality constraint by using a penalty function. Then, the inequality constraint is used during the initial head motion estimation to add additional robustness to the motion estimation.