Abstract:
A mobile device uses externals microphone signals to improve the estimate of background noise that it computes. In order to improve voice quality in a first signal that is produced by an internal microphone, the mobile device identifies an external microphone device within proximity of the mobile device. The mobile device establishes a wireless connection with the external microphone device. The mobile device receives a second signal from the external microphone device through the wireless connection. The second signal is produced by a microphone of the external microphone device. The mobile device generates a noise profile based on the second signal, and then suppresses background/ambient noise from the first signal based on the noise profile. Other embodiments are also described.
Abstract:
In one embodiment, a process for suppressing reverberation begins with a device of a user obtaining a reverberant speech signal from a voice of the user. The device determines a first estimated reverberation component of the reverberant speech signal. The device generates a first de-reverberated output signal with a first reverberation suppression based on the reverberant speech signal and the first estimated reverberation component. Then, the device generates a second improved reverberation component using the first de-reverberated output signal. The device generates a second de-reverberated output signal with a second reverberation suppression based on the reverberant speech signal and the second improved reverberation component.
Abstract:
An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.
Abstract:
An audio system and method of using the audio system to augment spatial audio rendition is described. The audio system can include a device to receive user inputs designating positions on an augmented reality view of a listening environment. Sound source icons can be presented in the augmented reality view at the designated positions. The sound source icons can visually represent sound sources at locations in the listening environment that correspond to, but are different than, the positions. One or more processors of the audio system can apply head-related transfer functions, which correspond to the locations in the listening environment, to audio input signals to generate binaural audio signals. The audio system can include a headset that uses the binaural audio signals to render spatialized audio localizing sounds to the locations in the listening environment. Other aspects are also described and claimed.
Abstract:
A system and method is described for determining whether a loudspeaker device has relocated, tilted, rotated, or changed environment such that one or more parameters for driving the loudspeaker may be modified and/or a complete reconfiguration of the loudspeaker system may be performed. In one embodiment, the system may include a set of sensors. The sensors provide readings that are analyzed to determine 1) whether the loudspeaker has moved since a previous analysis and/or 2) a distance of movement and/or a degree change in orientation of the loudspeaker since the previous analysis. Upon determining the level of movement is below a threshold value, the system adjusts previous parameters used to drive one or more of the loudspeakers. By adjusting previous parameters instead of performing a complete recalibration, the system provides a more efficient technique for ensuring that the loudspeakers continue to produce accurate sound for the listener.
Abstract:
A method for controlling a speech enhancement process in a far-end device, while engaged in a voice or video telephony communication session over a communication link with a near-end device. A near-end user speech signal is produced, using a microphone to pick up speech of a near-end user, and is analyzed by an automatic speech recognizer (ASR) without being triggered by an ASR trigger phrase or button. The recognized words are compared to a library of phrases to select a matching phrase, where each phrase is associated with a message that represents an audio signal processing operation. The message associated with the matching phrase is sent to the far-end device, which is used to configure the far-end device to adjust the speech enhancement process that produces the far-end speech signal. Other embodiments are also described.
Abstract:
Electronic system for audio noise processing and noise reduction comprises: first and second noise estimators, selector and attenuator. First noise estimator processes first audio signal from voice beamformer (VB) and generate first noise estimate. VB generates first audio signal by beamforming audio signals from first and second audio pick-up channels. Second noise estimator processes first and second audio signal from noise beamformer (NB), in parallel with first noise estimator and generates second noise estimate. NB generates second audio signal by beamforming audio signals from first and second audio pick-up channels. First and second audio signals include frequencies in first and second frequency regions. Selector's output noise estimate may be a) second noise estimate in the first frequency region, and b) first noise estimate in the second frequency region. Attenuator attenuates first audio signal in accordance with output noise estimate. Other embodiments are also described.
Abstract:
Method for improving noise suppression for ASR starts with a microphone receiving an audio signal including speech signal and noise signal. In each frame for frequency band of audio signal, a noise estimator detects ambient noise level and generates noise estimate value based on estimated ambient noise level, variable noise suppression target controller generates suppression target value using noise estimate value and logistic function, a gain value calculator generates a gain value based on suppression target value and noise estimate value, and combiner enhances the audio signal by the gain value to generate a clean audio signal in each frame for all frequency bands. Logistic function models desired noise suppression level that varies based on ambient noise level. Variable level of noise suppression includes low attenuation for low noise levels and progressively higher attenuation for higher noise level. Other embodiments are also described.
Abstract:
An orientation detector can have a first microphone, a second microphone, and a reference microphone spaced from the first microphone and the second microphone. An orientation processor can be configured to determine an orientation of the first microphone, the second microphone, or both, relative to a user's mouth based on a comparison of a relative strength of a first signal associated with the first microphone to a relative strength of a second signal associated with the second microphone. A channel selector in a speech enhancer can select one signal from among several signals based at least in part on the orientation determined by the orientation processor. A mobile communication handset can include a microphone-based orientation detector of the type disclosed herein.
Abstract:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.