Abstract:
Methods, systems, and apparatuses are described for improved multi-microphone source tracking and noise suppression. In multi-microphone devices and systems, frequency domain acoustic echo cancellation is performed on each microphone input, and microphone levels and sensitivity are normalized. Methods, systems, and apparatuses are also described for improved acoustic scene analysis and source tracking using steered null error transforms, on-line adaptive acoustic scene modeling, and speaker-dependent information. Switched super-directive beamforming reinforces desired audio sources and closed-form blocking matrices suppress desired audio sources based on spatial information derived from microphone pairings. Underlying statistics are tracked and used to updated filters and models. Automatic detection of single-user and multi-user scenarios, and single-channel suppression using spatial information, non-spatial information, and residual echo are also described.
Abstract:
Techniques described herein are directed to the enhancement of spectral features of an audio signal via adaptive modulation filtering. The adaptive modulation filtering process is based on observed modulation envelope autocorrelation coefficients obtained from the audio signal. The modulation envelope autocorrelation coefficients are used to determine parameters of an adaptive filter configured to filter the spectral features of the audio signal to provide filtered spectral features. The parameters are updated based on the observed modulation envelope autocorrelation coefficients to adapt to changing acoustic conditions, such as signal-to-noise ratio (SNR) or reverberation time. Accordingly, such acoustic conditions are not required to be estimated explicitly. Techniques described herein also allow for the estimation of useful side information, e.g., signal-to-noise ratios, based on the observed spectral features of the audio signal and the filtered spectral features, which can be used to improve speaker identification algorithms and/or other audio processing algorithms.
Abstract:
Methods, systems, and apparatuses are described for isolated word training and detection. Isolated word training devices and systems are provided in which a user may provide a wake-up phrase from 1 to 3 times to train the device or system. A concatenated phoneme model of the user-provided wake-up phrase may be generated based on the provided wake-up phrase and a pre-trained phoneme model database. A word model of the wake-up phrase may be subsequently generated from the concatenated phoneme model and the provided wake-up phrase. Once trained, the user-provided wake-up phrase may be used to unlock the device or system and/or to wake up the device or system from a standby mode of operation. The word model of the user-provided wake-up phrase may be further adapted based on additional provisioning of the wake-up phrase.
Abstract:
Methods, systems, and apparatuses are described for improved multi-microphone source tracking and noise suppression. In multi-microphone devices and systems, frequency domain acoustic echo cancellation is performed on each microphone input, and microphone levels and sensitivity are normalized. Methods, systems, and apparatuses are also described for improved acoustic scene analysis and source tracking using steered null error transforms, on-line adaptive acoustic scene modeling, and speaker-dependent information. Switched super-directive beamforming reinforces desired audio sources and closed-form blocking matrices suppress desired audio sources based on spatial information derived from microphone pairings. Underlying statistics are tracked and used to updated filters and models. Automatic detection of single-user and multi-user scenarios, and single-channel suppression using spatial information, non-spatial information, and residual echo are also described.
Abstract:
Methods, systems, and apparatuses are described for performing speaker-identification-assisted speech processing. In accordance with certain embodiments, a communication device includes speaker identification (SID) logic that is configured to identify a user of the communication device and/or the identity of a far-end speaker participating in a voice call with a user of the communication device. Knowledge of the identity of the user and/or far-end speaker is then used to improve the performance of one or more speech processing algorithms implemented on the communication device.
Abstract:
Techniques described herein are directed to performing back-end single-channel suppression of one or more types of interfering sources (e.g., additive noise) in an uplink path of a communication device. The back-end single-channel suppression techniques may suppress types(s) of additive noise using one or more suppression branches (e.g., a non-spatial (or stationary noise) branch, a spatial (or non-stationary noise) branch, a residual echo suppression branch, etc.). The non-spatial branch may be configured to suppress stationary noise from the single-channel audio signal, the spatial branch may be configured to suppress non-stationary noise from the single-channel audio signal and the residual echo suppression branch may be configured to suppress residual echo from the signal-channel audio signal. The spatial branch may be disabled based on an operational mode (e.g., single-user speakerphone mode or a conference speakerphone mode) of the communication device or based on a determination that spatial information is ambiguous.
Abstract:
Methods, systems, and apparatuses are described for performing speaker-identification-assisted speech processing in a downlink path of a communication device. In accordance with certain embodiments, a communication device includes speaker identification (SID) logic that is configured to identify the identity of a far-end speaker participating in a voice call with a user of the communication device. Knowledge of the identity of the far-end speaker is then used to improve the performance of one or more downlink speech processing algorithms implemented on the communication device.
Abstract:
Techniques described herein are directed to performing back-end single-channel suppression of one or more types of interfering sources (e.g., additive noise) in an uplink path of a communication device. The back-end single-channel suppression techniques may suppress types(s) of additive noise using one or more suppression branches (e.g., a non-spatial (or stationary noise) branch, a spatial (or non-stationary noise) branch, a residual echo suppression branch, etc.). The non-spatial branch may be configured to suppress stationary noise from the single-channel audio signal, the spatial branch may be configured to suppress non-stationary noise from the single-channel audio signal and the residual echo suppression branch may be configured to suppress residual echo from the signal-channel audio signal. The spatial branch may be disabled based on an operational mode (e.g., single-user speakerphone mode or a conference speakerphone mode) of the communication device or based on a determination that spatial information is ambiguous.
Abstract:
Techniques described herein are directed to the enhancement of spectral features of an audio signal via adaptive modulation filtering. The adaptive modulation filtering process is based on observed modulation envelope autocorrelation coefficients obtained from the audio signal. The modulation envelope autocorrelation coefficients are used to determine parameters of an adaptive filter configured to filter the spectral features of the audio signal to provide filtered spectral features. The parameters are updated based on the observed modulation envelope autocorrelation coefficients to adapt to changing acoustic conditions, such as signal-to-noise ratio (SNR) or reverberation time. Accordingly, such acoustic conditions are not required to be estimated explicitly. Techniques described herein also allow for the estimation of useful side information, e.g., signal-to-noise ratios, based on the observed spectral features of the audio signal and the filtered spectral features, which can be used to improve speaker identification algorithms and/or other audio processing algorithms.
Abstract:
Methods, systems, and apparatuses are described for improved multi-microphone source tracking and noise suppression. In multi-microphone devices and systems, frequency domain acoustic echo cancellation is performed on each microphone input, and microphone levels and sensitivity are normalized. Methods, systems, and apparatuses are also described for improved acoustic scene analysis and source tracking using steered null error transforms, on-line adaptive acoustic scene modeling, and speaker-dependent information. Switched super-directive beamforming reinforces desired audio sources and closed-form blocking matrices suppress desired audio sources based on spatial information derived from microphone pairings. Underlying statistics are tracked and used to updated filters and models. Automatic detection of single-user and multi-user scenarios, and single-channel suppression using spatial information, non-spatial information, and residual echo are also described.