摘要:
Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.
摘要:
A system that facilitates managing resources (e.g., functionality, services) based at least in part upon an established context. More particularly, a context determination component can be employed to establish a context by processing sensor inputs or learning/inferring a user action/preference. Once the context is established via context determination component, a power/mode management component can be employed to activate and/or mask resources in accordance with the established context. The power and mode management of the device can extend life of a power source (e.g., battery) and mask functionality in accordance with a user and/or device state.
摘要:
A method and apparatus determine a channel response for an alternative sensor using an alternative sensor signal, an air conduction microphone signal. The channel response and a prior probability distribution for clean speech values are then used to estimate a clean speech value.
摘要:
A first set of signals from an array of one or more microphones, and a second signal from a reference microphone are used to calibrate a set of filter parameters such that the filter parameters minimize a difference between the second signal and a beamformer output signal that is based on the first set of signals. Once calibrated, the filter parameters are used to form a beamformer output signal that is filtered using a non-linear adaptive filter that is adapted based on portions of a signal that do not contain speech, as determined by a speech detection sensor.
摘要:
A system and method for facilitating low bandwidth video image transmission in video conferencing systems. A target is acquired (video image of a person's head) and processed to identify one or more sub-regions (e.g., background, eyes, mouth and head). The invention incorporates a fast feature matching methodology to match a current sub-region with previously stored sub-regions. If a match is found, an instruction is sent to the receiving computer to generate the next frame of video data from the previously stored blocks utilizing a texture synthesis technique. The invention is applicable for video conferencing in low bandwidth environments.
摘要:
A multimodal, multilanguage mobile device which can be employed to enhance note taking and/or annotation of a document, and gaming. Input data types such as optical character recognition (OCR), speech, handwriting, and visual information (e.g., image and/or video), etc., can be fused to generate rich documents with a multidimensional level of data to provide an increased level of context over conventional documents. Such architecture can be utilized by students for homework management, as well as entertainment (e.g., gaming).
摘要:
A multimodal system that employs a plurality of sensing modalities which can be processed concurrently to increase confidence in connection with authentication. The multimodal system and/or set of various devices can provide several points of information entry in connection with authentication. Authentication can be improved, for example, by combining face recognition, biometrics, speech recognition, handwriting recognition, gait recognition, retina scan, thumb/hand prints, or subsets thereof. Additionally, portable multimodal devices (e.g., a smartphone) can be used as credit cards, and authentication in connection with such use can mitigate unauthorized transactions.
摘要:
Described herein is a technique for creating a 3D face model using images obtained from an inexpensive camera associated with a general-purpose computer. Two still images of the user are captured, and two video sequences. The user is asked to identify five facial features, which are used to calculate a mask and to perform fitting operations. Based on a comparison of the still images, deformation vectors are applied to a neutral face model to create the 3D model. The video sequences are used to create a texture map. The process of creating the texture map references the previously obtained 3D model to determine poses of the sequential video images.
摘要:
A method and apparatus determine a channel response for an alternative sensor using an alternative sensor signal, an air conduction microphone signal. The channel response and a prior probability distribution for clean speech values are then used to estimate a clean speech value.
摘要:
The present invention includes a real-time wide-angle image correction system and a method for alleviating distortion and perception problems in images captured by wide-angle cameras. In general, the real-time wide-angle image correction method generates warp table from pixel coordinates of a wide-angle image and applies the warp table to the wide-angle image to create a corrected wide-angle image. The corrections are performed using a parametric class of warping functions that include Spatially Varying Uniform (SVU) scaling functions. The SVU scaling functions and scaling factors are used to perform vertical scaling and horizontal scaling on the wide-angle image pixel coordinates. A horizontal distortion correction is performed using the SVU scaling functions at and at least two different scaling factors. This processing generates a warp table that can be applied to the wide-angle image to yield the corrected wide-angle image.