摘要:
An augmented reality environment allows interaction between virtual and real objects. Multiple microphone arrays of different physical sizes are used to acquire signals for spatial tracking of one or more sound sources within the environment. A first array with a larger size may be used to track an object beyond a threshold distance, while a second array having a size smaller than the first may be used to track the object up to the threshold distance. By selecting different sized arrays, accuracy of the spatial location is improved.
摘要:
An automatic speech recognition engine receives an acoustic-echo processed signal from an acoustic-echo processing (AEP) module, where said echo processed signal contains mainly the speech from the near-end talker. The automatic speech recognition engine analyzes the content of the acoustic-echo processed signal to determine whether words or keywords are present. Based upon the results of this analysis, the automatic speech recognition engine produces a value reflecting the likelihood that some words or keywords are detected. Said value is provided to the AEP module. Based upon the value, the AEP module determines if there is double talk and processes the incoming signals accordingly to enhance its performance.
摘要:
Techniques are described for recognizing an audio double tap or other tapped audio sequences generated by a user. Amplitudes of an audio signal are processed to generate an energy function or curve. The energy curve is analyzed to detect audio pulses. Detected pulses are validated and double tap events are detected based on features such as duration, power, and/or symmetry, plus additional rules related to the structure of the audio event.
摘要:
Primary and alternate optimization procedures are used to improve the ITU-T G.723.1 speech coding standard (the “Standard”) by replacing the Hamming window of the Standard with an optimized window, with two windows, or with two windows and an additional performance of an autocorrelation method. When two windows replace the Hamming window, at least one of which is an optimized window, generally the first is used to determine optimized unquantized LP coefficients which are used to define an optimized perceptual weighting filter, and the second is used to determine optimized unquantized LP coefficients which are used to determine optimized synthesis coefficients. Optimized windows created using the primary and alternate optimization procedures and used in the Standard yield improvements in the objective and subjective quality of synthesized speech produced by the Standard. The improved Standard, methods, and widow can all be implemented as computer readable software code.
摘要:
A sound source locator efficiently employs a distributed physical or logical microphone array to determine a location of a source of a sound. In some instances, the sound source locator is deployed in an augmented reality environment. The sound source locator detects sound at a plurality of microphones, generates a signal corresponding to the sound, and causes attributes of signal as generated at the plurality of microphones to be stored in association with the corresponding microphone. The sound source locator uses these stored attributes to identify multiple groups of the plurality of microphones from which delays between the times the signal is generated can be used to compute the location of the source of the sound.
摘要:
Compression and decompression of image data, including a first image of an object. The first image may be divided into portions. For each portion, it may be determined whether the portion includes a part of the object. The image data may be compressed based on said determining. If a threshold ratio of portions that do not include a part of the object is reached, portions including a part of the object may be compressed according to a first compression method and portions not including a part of the object may not be compressed, where background information is stored for the portions not including a part of the object. If the threshold ratio of portions that do not include a part of the object is not reached, each portion of the object may be compressed according to the first compression method. The compressed data may be decompressed in a reverse fashion.
摘要:
Techniques for enhancing an acoustic echo canceller based on visual cues are described herein. The techniques include changing adaptation of a filter of the acoustic echo canceller, calibrating the filter, or reducing background noise from an audio signal processed by the acoustic echo canceller. The changing, calibrating, and reducing are responsive to visual cues that describe acoustic characteristics of a location of a device that includes the acoustic echo canceller. Such visual cues may indicate that no human being is present at the location, that some subject(s) are engaged in speaking or sound generating activities, or that motion associated with an echo path change has occurred at the location.
摘要:
Accurate and computationally efficient estimation of time delay of arrival data for localization of a sound source is described herein. A number of independent time delays are retained and validated through comparison with a set of dependent time delays. The method is robust against detrimental effects in the environment such as noise and reverberation. The resulting delays may then be used in sound source localization or other signal processing applications.
摘要:
Techniques for utilizing blind source separation as a front-end to an acoustic echo canceller are described herein. The techniques include removing a first portion of an acoustic echo from an audio signal using blind source separation and a reference signal. The techniques then further remove a second portion of the acoustic echo using an acoustic echo canceller and the reference signal. Further, output of the blind source separation may be used to improve double-talk detection.
摘要:
The shape of windows used during linear predictive analysis can be optimized through the use of gradient-descent based window optimization procedures. Window optimization may be achieved fairly precisely through the use of a primary optimization procedure, or less precisely through the use of an alternate optimization procedure. Both optimization procedures use the principle of gradient-descent to find a window sequence that will either minimize the prediction error energy or maximize the segmental prediction gain. However, the primary optimization procedure uses a Levinson-Durbin based algorithm to determine the gradient while the alternate optimization procedure uses an estimate of the gradient based on the basic definition of a derivative. These optimization procedures can be implemented as computer readable software code. Additionally, the optimization procedures may be implemented in a window optimization device which generally includes a window optimization unit and may also include an interface unit.