摘要:
A real-time passive acoustic source localization system for video camera steering advantageously determines the relative delay between the direct paths of two estimated channel impulse responses. The illustrative system employs an approach referred to herein as the “adaptive eigenvalue decomposition algorithm” (AEDA) to make such a determination, and then advantageously employs a “one-step least-squares algorithm” (OSLS) for purposes of acoustic source localization, providing the desired features of robustness, portability, and accuracy in a reverberant environment. The AEDA technique directly estimates the (direct path) impulse response from the sound source to each of a pair of microphones, and then uses these estimated impulse responses to determine the time delay of arrival (TDOA) between the two microphones by measuring the distance between the first peaks thereof (i.e., the first significant taps of the corresponding transfer functions). In one embodiment, the system minimizes an error function (i.e., a difference) which is computed with the use of two adaptive filters, each such filter being applied to a corresponding one of the two signals received from the given pair of microphones. The filtered signals are then subtracted from one another to produce the error signal, which is minimized by a conventional adaptive filtering algorithm such as, for example, an LMS (Least Mean Squared) technique. Then, the TDOA is estimated by measuring the “distance” (i.e., the time) between the first significant taps of the two resultant adaptive filter transfer functions.
摘要:
A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.
摘要:
A system and method may receive a single-channel speech input captured via a microphone. For each current frame of speech input, the system and method may (a) perform a time-frequency transformation on the input signal over L (L>1) frames including the current frame to obtain an extended observation vector of the current frame, data elements in the extended observation vector representing the coefficients of the time-frequency transformation of the L frames of the speech input, (b) compute second-order statistics of the extended observation vector and of noise, and (c) construct a noise reduction filter for the current frame of the speech input based on the second-order statistics of the extended observation vector and the second-order statistics of noise.
摘要:
An apparatus for mixing audio signals in a voice-over-IP teleconferencing environment comprises a preprocessor, a mixing controller, and a mixing processor. The preprocessor is divided into a media parameter estimator and a media preprocessor. The media parameter estimator estimates signal parameters such as signal-to-noise ratios, energy levels, and voice activity (i.e., the presence or absence of voice in the signal), which are used to control how different channels are mixed. The media preprocessor employs signal processing algorithms such as silence suppression, automatic gain control, and noise reduction, so that the quality of the incoming voice streams is optimized. Based on a function of the estimated signal parameters, the mixing controller specifies a particular mixing strategy and the mixing processor mixes the preprocessed voice streams according the strategy provided by the controller.