Abstract:
Provided are methods and systems for enhancing speech when corrupted by transient noise (e.g., keyboard typing noise). The methods and systems utilize a reference microphone input signal for the transient noise in a signal restoration process used for the voice part of the signal. A robust Bayesian statistical model is used to regress the voice microphone on the reference microphone, which allows for direct inference about the desired voice signal while marginalizing the unwanted power spectral values of the voice and transient noise. Also provided is a straightforward and efficient Expectation-maximization (EM) procedure for fast enhancement of the corrupted signal. The methods and systems are designed to operate easily in real-time on standard hardware, and have very low latency so that there is no irritating delay in speaker response.
Abstract:
Provided are methods and systems for generating Direct-to-Reverberant Ratio (DRR) estimates. The methods and systems use a null-steered beamformer to produce accurate DRR estimates across a variety of room sizes, reverberation times, and source-receiver distances. The DRR estimation algorithm uses spatial selectivity to separate direct and reverberant energy and account for noise separately. The formulation considers the response of the beamformer to reverberant sound and the effect of noise. The DRR estimation algorithm is more robust to background noise than existing approaches, and is applicable where a signal is recorded with two or more microphones, such as with mobile communications devices, laptop computers, and the like.
Abstract:
Provided are methods, systems, and apparatus for hierarchical decorrelation of multichannel audio. A hierarchical decorrelation algorithm is designed to adapt to possibly changing characteristics of an input signal, and also preserves the energy of the original signal. The algorithm is invertible in that the original signal can be retrieved if needed. Furthermore, the proposed algorithm decomposes the decorrelation process into multiple low-complexity steps. The contribution of these steps is generally in a decreasing order, and thus the complexity of the algorithm can be scaled.
Abstract:
Methods and systems are provided for detecting chop in an audio signal. A time-frequency representation, such as a spectrogram, is created for an audio signal and used to calculate a gradient of mean power per frame of the audio signal. Positive and negative gradients are defined for the signal based on the gradient of mean power, and a maximum overlap offset between the positive and negative gradients is determined by calculating a value that maximizes the cross-correlation of the positive and negative gradients. The negative gradient values may be combined (e.g., summed) with the overlap offset, and the combined values then compared with a threshold to estimate the amount of chop present in the audio signal. The chop detection model provided is low-complexity and is applicable to narrowband, wideband, and superwideband speech.
Abstract:
Provided are methods and systems for detecting the presence of a transient noise event in an audio stream using primarily or exclusively the incoming audio data. Such an approach offers improved temporal resolution and is computationally efficient. The methods and systems presented utilize some time-frequency representation of an audio signal as the basis in a predictive model in an attempt to find outlying transient noise events and interpret the true detection state as a Hidden Markov Model (HMM) to model temporal and frequency cohesion common amongst transient noise events.
Abstract:
Provided are methods and systems for acoustic keystroke transient cancellation/suppression for user communication devices using a semi-blind adaptive filter model. The methods and systems are designed to overcome existing problems in transient noise suppression by taking into account some less-defective signal as side information on the transients and also accounting for acoustic signal propagation, including the reverberation effects, using dynamic models. The methods and systems take advantage of a synchronous reference microphone embedded in the keyboard of the user device, and utilize an adaptive filtering approach exploiting the knowledge of this keybed microphone signal.
Abstract:
Provided are methods and systems for providing situation-dependent transient noise suppression for audio signals. Different strategies (e.g., levels of aggressiveness) of transient suppression and signal restoration are applied to audio signals associated with participants in a video/audio conference depending on whether or not each participant is speaking (e.g., whether a voiced segment or an unvoiced/non-speech segment of audio is present). If no participants are speaking or there is an unvoiced/non-speech sound present, a more aggressive strategy for transient suppression and signal restoration is utilized. On the other hand, where voiced audio is detected (e.g., a participant is speaking), the methods and systems apply a softer, less aggressive suppression and restoration process.
Abstract:
Methods and systems are provided for using a model of human speech quality perception to provide an objective measure for predicting subjective quality assessments. A Virtual Speech Quality Objective Listener (ViSQOL) model is a signal-based full-reference metric that uses a spectro-temporal measure of similarity between a reference signal and test speech signal. Specifically, the model provides for the ability to detect and predict the level of clock drift, and determine whether such clock drift will impact a listener's quality of experience.
Abstract:
Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.
Abstract:
Provided are methods, systems, and apparatus for hierarchical decorrelation of multichannel audio. A hierarchical decorrelation algorithm is designed to adapt to possibly changing characteristics of an input signal, and also preserves the energy of the original signal. The algorithm is invertible in that the original signal can be retrieved if needed. Furthermore, the proposed algorithm decomposes the decorrelation process into multiple low-complexity steps. The contribution of these steps is generally in a decreasing order, and thus the complexity of the algorithm can be scaled.