Abstract:
Apparatuses, methods, computer readable mediums, and systems are described for combined dynamic processing and speaker protection for minimizing distortion in audio playback. In some embodiments, at least one compressed audio signal is received, at least one threshold for a speaker is retrieved, modifications to audio signal compression are determined based on the at least one compressed audio signal and the at least one threshold, information embodying the modifications is transmitted to a dynamic processor, and using the dynamic processor, at least one modified compressed audio signal is produced for the speaker based on the information.
Abstract:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
Abstract:
Apparatus for linear and nonlinear acoustic echo control includes loudspeaker, first, second, and third microphone, beamformer, and first echo canceller. The loudspeaker outputs a loudspeaker signal that includes reference signal. The first microphone and the second microphone are collocated with the loudspeaker, receive at least one of: a near-end speaker signal from a near-end speaker and the loudspeaker signal, and generate first and second microphone uplink signals, respectively. The third microphone receives the near-end speaker signal and generates a third microphone uplink signal. The beamformer receives the first and second microphone uplink signals, directs a beam towards the loudspeaker and drives a null towards the near-end speaker, and generates a beamformer output. The first echo canceler receives the third microphone uplink signal and the beamformer output, and cancels echoes in the third microphone uplink signal based on the beamformer output to generate an echo cancelled signal. Other embodiments are described.
Abstract:
Systems and methods for speech recognition system having a speech processor that is trained to recognize speech by considering (1) a raw microphone signal that includes an echo signal and (2) different types of echo information signals from an echo cancellation system (and optionally different types of ambient noise suppression signals from a noise suppressor). The different types of echo information signals may include those used for echo cancelation and those having echo information. The speech recognition system may convert the raw microphone signal and different types of echo information signals (and optional noise suppression signals) into spectral features in the form of a vector, and a concatenator to combine the feature vectors into a total vector (for a period of time) that is used to train the speech processor, and during use of the speech processor to recognize speech.
Abstract:
Systems and methods for speech recognition system having a speech processor that is trained to recognize speech by considering (1) a raw microphone signal that includes an echo signal and (2) different types of echo information signals from an echo cancellation system (and optionally different types of ambient noise suppression signals from a noise suppressor). The different types of echo information signals may include those used for echo cancelation and those having echo information. The speech recognition system may convert the raw microphone signal and different types of echo information signals (and optional noise suppression signals) into spectral features in the form of a vector, and a concatenator to combine the feature vectors into a total vector (for a period of time) that is used to train the speech processor, and during use of the speech processor to recognize speech.
Abstract:
Automatic gain control systems disclosed herein can incorporate a confidence metric that can estimate the accuracy of gain adjustments calculated by an automatic gain control module. The confidence metric may be based on a percentage of valid audio samples in a given period of time. Based on the confidence metric, the AGC response may be reduced, delayed, frozen, or otherwise altered from the baseline gain adjustment. Time-averaging process may be used to estimate the input signal power level and determine an appropriate baseline gain adjustment. Additionally, weighting functions can be adjusted to prevent overestimation of the signal power.
Abstract:
Systems and methods for determining the operating condition of multiple microphones of an electronic device are disclosed. A system can include a plurality of microphones operative to receive signals, a microphone condition detector, and a plurality of microphone condition determination sources. The microphone condition detector can determine a condition for each of the plurality of microphones by using the received signals and accessing at least one microphone condition determination source.
Abstract:
Method of improving audio signal in the spectral domain starts by receiving audio signal that includes signals from sources including speech source and music source. Audio signal is tuned for output by sound output device. Portions of audio signal are analyzed in a spectral domain to determine whether adjustments are required. Analyzing portions of audio signal includes determining whether anomaly is present in frequency band of audio signal in spectral domain by using at least one metric. Metrics include band energy ratios, spectral centroid, spectral tilt, spectral flux, spectral variance, absolute thresholds, and relative thresholds. Audio signal is adjusted to improve audio signal in spectral domain when audio signal is determined to require adjustments. Adjusting audio signal includes adjusting values of the metric in frequency band that is determined to include anomaly to correspond to clustering of metric values for audio signal in spectral domain. Other embodiments are also described.
Abstract:
An electronic device displays a messaging interface that allows a participant in a message conversation to capture, send, and/or play media content. The media content includes images, video, and/or audio. The media content is captured, sent, and/or played based on the electronic device detecting one or more conditions.
Abstract:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.