摘要:
A system and method are disclosed for calibrating a depth camera in a natural user interface. The system in general obtains an objective measurement of true distance between a capture device and one or more objects in a scene. The system then compares the true depth measurement to the depth measurement provided by the depth camera at one or more points and determines an error function describing an error in the depth camera measurement. The depth camera may then be recalibrated to correct for the error. The objective measurement of distance to one or more objects in a scene may be accomplished by a variety of systems and methods.
摘要:
Gender recognition is performed using two or more modalities. For example, depth image data and one or more types of data other than depth image data is received. The data pertains to a person. The different types of data are fused together to automatically determine gender of the person. A computing system can subsequently interact with the person based on the determination of gender.
摘要:
The subject disclosure is directed towards an immersive conference, in which participants in separate locations are brought together into a common virtual environment (scene), such that they appear to each other to be in a common space, with geometry, appearance, and real-time natural interaction (e.g., gestures) preserved. In one aspect, depth data and video data are processed to place remote participants in the common scene from the first person point of view of a local participant. Sound data may be spatially controlled, and parallax computed to provide a realistic experience. The scene may be augmented with various data, videos and other effects/animations.
摘要:
Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.
摘要:
A system that facilitates managing resources (e.g., functionality, services) based at least in part upon an established context. More particularly, a context determination component can be employed to establish a context by processing sensor inputs or learning/inferring a user action/preference. Once the context is established via context determination component, a power/mode management component can be employed to activate and/or mask resources in accordance with the established context. The power and mode management of the device can extend life of a power source (e.g., battery) and mask functionality in accordance with a user and/or device state.
摘要:
Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.
摘要:
A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.
摘要:
Described is a hierarchical filtered motion field technology such as for use in recognizing actions in videos with crowded backgrounds. Interest points are detected, e.g., as 2D Harris corners with recent motion, e.g. locations with high intensities in a motion history image (MHI). A global spatial motion smoothing filter is applied to the gradients of MHI to eliminate low intensity corners that are likely isolated, unreliable or noisy motions. At each remaining interest point, a local motion field filter is applied to the smoothed gradients by computing a structure proximity between sets of pixels in the local region and the interest point. The motion at a pixel/pixel set is enhanced or weakened based on its structure proximity with the interest point (nearer pixels are enhanced).
摘要:
A multimodal system that employs a plurality of sensing modalities which can be processed concurrently to increase confidence in connection with authentication. The multimodal system and/or set of various devices can provide several points of information entry in connection with authentication. Authentication can be improved, for example, by combining face recognition, biometrics, speech recognition, handwriting recognition, gait recognition, retina scan, thumb/hand prints, or subsets thereof. Additionally, portable multimodal devices (e.g., a smartphone) can be used as credit cards, and authentication in connection with such use can mitigate unauthorized transactions.
摘要:
A subregion-based image parameter recovery system and method for recovering image parameters from a single image containing a face taken under sub-optimal illumination conditions. The recovered image parameters (including albedo, illumination, and face geometry) can be used to generate face images under a new lighting environment. The method includes dividing the face in the image into numerous smaller regions, generating an albedo morphable model for each region, and using a Markov Random Fields (MRF)-based framework to model the spatial dependence between neighboring regions. Different types of regions are defined, including saturated, shadow, regular, and occluded regions. Each pixel in the image is classified and assigned to a region based on intensity, and then weighted based on its classification. The method decouples the texture from the geometry and illumination models, and then generates an objective function that is iteratively solved using an energy minimization technique to recover the image parameters.