摘要:
Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.
摘要:
Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.
摘要:
Emulating legacy hardware using IEEE 754 compliant hardware is disclosed herein. In some aspects, the emulation includes locating an instruction that includes NaN (not a number) as at least one of an operand or a resultant. The emulation adjusts the resultant of the instruction, via additional code, to produce a final resultant of non-compliant (legacy) hardware. Legacy software, which was written in anticipation of processing by legacy hardware, may then be processed using compliant hardware.
摘要:
A method for biometric identification for use with a computing device is provided herein. The method includes capturing a temporal sequence of images of the face of a user at different locations within a three-dimensional interaction space. The method further includes extracting one or more face descriptors from the images and generating a biometric template compiling the face descriptors.
摘要:
A video game system (or other data processing system) can visually identify a person entering a field of view of the system and determine whether the person has been previously interacting with the system. In one embodiment, the system establishes thresholds, enrolls players, performs the video game (or other application) including interacting with a subset of the players based on the enrolling, determines that a person has become detectable in the field of view of the system, automatically determines whether the person is one of the enrolled players, maps the person to an enrolled player and interacts with the person based on the mapping if it is determined that the person is one of the enrolled players, and assigns a new identification to the person and interacts with the person based on the new identification if it is determined that the person is not one of the enrolled players.
摘要:
A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.
摘要:
A method for biometric identification for use with a computing device is provided herein. The method includes capturing a temporal sequence of images of the face of a user at different locations within a three-dimensional interaction space. The method further includes extracting one or more face descriptors from the images and generating a biometric template compiling the face descriptors.
摘要:
A system and method are disclosed for tracking image and audio data over time to automatically identify a person based on a correlation of their voice with their body in a multi-user game or multimedia setting.
摘要:
An exemplary method for emulating a graphics processing unit (GPU) includes executing a graphics application on a host computing system to generate commands for a target GPU wherein the host computing system includes host system memory and a different, host GPU; converting the generated commands into intermediate commands; based on one or more generated commands that call for one or more shaders, caching one or more corresponding shaders in a shader cache in the host system memory; based on one or more generated commands that call for one or more resources, caching one or more corresponding resources in a resource cache in the host system memory; based on the intermediate commands, outputting commands for the host GPU; and based on the output commands for the host GPU, rendering graphics using the host GPU where output commands that call for one or more shaders access the one or more corresponding shaders in the shader cache and where output commands that call for one or more resources access the one or more corresponding resources in the resource cache. Other methods, devices and systems are also disclosed.
摘要:
A system and method are disclosed for tracking image and audio data over time to automatically identify a person based on a correlation of their voice with their body in a multi-user game or multimedia setting.