摘要:
A plurality of microphones of a communication device is grouped into multiple microphone groups, such that each microphone group includes two or more microphones. For each microphone group, output of the corresponding microphones is processed to form an acoustic null in a corresponding spatial direction, such that sound from the corresponding spatial direction is attenuated in the processed output. One of the microphone groups is selected based on various factors leading to maximal echo attenuation and rejection of reverberant components of the room. The selected microphone group is then used to detect sound from a near end talker of the communication device.
摘要:
An automatic speech recognition engine receives an acoustic-echo processed signal from an acoustic-echo processing (AEP) module, where said echo processed signal contains mainly the speech from the near-end talker. The automatic speech recognition engine analyzes the content of the acoustic-echo processed signal to determine whether words or keywords are present. Based upon the results of this analysis, the automatic speech recognition engine produces a value reflecting the likelihood that some words or keywords are detected. Said value is provided to the AEP module. Based upon the value, the AEP module determines if there is double talk and processes the incoming signals accordingly to enhance its performance.
摘要:
Techniques for enhancing an acoustic echo canceller based on visual cues are described herein. The techniques include changing adaptation of a filter of the acoustic echo canceller, calibrating the filter, or reducing background noise from an audio signal processed by the acoustic echo canceller. The changing, calibrating, and reducing are responsive to visual cues that describe acoustic characteristics of a location of a device that includes the acoustic echo canceller. Such visual cues may indicate that no human being is present at the location, that some subject(s) are engaged in speaking or sound generating activities, or that motion associated with an echo path change has occurred at the location.
摘要:
Techniques for utilizing blind source separation as a front-end to an acoustic echo canceller are described herein. The techniques include removing a first portion of an acoustic echo from an audio signal using blind source separation and a reference signal. The techniques then further remove a second portion of the acoustic echo using an acoustic echo canceller and the reference signal. Further, output of the blind source separation may be used to improve double-talk detection.
摘要:
An augmented reality environment allows interaction between virtual and real objects. Beamforming techniques are applied to signals acquired by an array of microphones to allow for simultaneous spatial tracking and signal acquisition from multiple users. Localization information such as from other sensors in the environment may be used to select a particular set of beamformer coefficients and resulting beampattern focused on a signal source. Alternately, a series of beampatterns may be used iteratively to localize the signal source in a computationally efficient fashion. The beamformer coefficients may be pre-computed.
摘要:
An augmented reality environment allows interaction between virtual and real objects. Beamforming techniques are applied to signals acquired by an array of microphones to allow for simultaneous spatial tracking and signal acquisition from multiple users. Localization information such as from other sensors in the environment may be used to select a particular set of beamformer coefficients and resulting beampattern focused on a signal source. Alternately, a series of beampatterns may be used iteratively to localize the signal source in a computationally efficient fashion. The beamformer coefficients may be pre-computed.
摘要:
An audio processing system configured to generate, based at least in part on captured sound, an audio signal that includes a speech component corresponding to a user's speech utterance and an audio component corresponding to audio output of another device is described herein. The audio processing system is also configured to receive a reference signal that corresponds to the audio output of the other device. The reference signal may be received as ultrasonic audio output of the other device or from a remote server. The audio processing device then processes the generated audio signal to remove at least a part of the generated audio signal that corresponds to the reference signal.
摘要:
Acoustic signals may be localized such that their position in space is determined. Time-difference-of-arrival data from multiple microphones may be used for this localization. Signal data from the microphones may be degraded by reverberation and other environmental distortions, resulting in erroneous localization. By detecting a portion of the signal resulting from sound directly reaching a microphone rather than from a reverberation, accuracy of the localization is improved.
摘要:
The location of a sound within a given spatial volume may be used in applications such as augmented reality environments. An artificial neural network processes time-difference-of-arrival data (TDOA) from a known microphone array to determine a spatial location of the sound. The neural network may be located locally or available as a cloud service. The artificial neural network is trained with perturbed and non-perturbed TDOA data.
摘要:
Techniques are provided for sending and receiving key frames and key frame request messages. At a video conference bridge, a key frame request message is received from a first endpoint device. The key frame request message comprises a request for a key frame from a second endpoint device. When a prior key frame request message is received before the key frame request message, a key frame request time value is determined that corresponds to an amount of time between receiving the key frame request message and receiving the prior key frame request message. This value is compared to a threshold time value. When the key frame request time is greater than the threshold time, a key frame request forwarding message is generated, and the key frame request forwarding message is sent to the second endpoint device to request the key frame from the second endpoint device.