Abstract:
The disclosure is directed to pre-processing audio signals. In one implementation, an electronic device (102) receives an audio signal that has audio information, obtains auxiliary information (such as location, velocity, direction, light, proximity of objects, and temperature), and determines, based on the audio information and the auxiliary information, a type of audio environment in which the electronic device (102) is operating. The device (102) selects an audio pre-processing procedure based on the determined audio environment type and pre-processes the audio signal according to the selected pre-processing procedure. The device (102) may then perform speech recognition on the pre-processed audio signal.
Abstract:
A method and apparatus for adapting acoustic processing in a communication device (102), and capturing (302) at least one acoustic signal using acoustic hardware (218, 224) of the communication device (102), characterizing (304) an acoustic environment external to the communication device (102) using the at least one captured acoustic signal, adapting (306) acoustic processing within the communication device (102) based on the characterized acoustic environment.
Abstract:
A mobile device is adapted for automatic speech recognition (ASR). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on ASR results that correspond to the speech input. A local controller obtains a sample of non-ASR audio from the input microphone for ASR- adaptation to channel-specific ASR characteristics, and then provides a representation of the non-ASR audio to a remote ASR server for server-side adaptation to the channel- specific ASR characteristics, and then provides a representation of an unknown ASR speech input from the input microphone to the remote ASR server for determining ASR results corresponding to the unknown ASR speech input, and then provides the system output to the output interface.
Abstract:
Provided is a method for adaptively enhancing an end-user's perceived quality, or quality of experience (QoE), of speech and other audio under ambient noise conditions. The method comprises the steps of determining the ambient noise characteristics on a continuous basis to capture the time varying nature of ambient noises, and adaptively determining the most optimal signal shaping to be applied to the audio/speech signal to produce the most appropriate enhancement to compensate for the ambient noise impairment. The method also comprises a signal shaping technique by using an infinite impulse response (IIR) filter that performs the signal modification with a low delay; a multi-level automatic gain control (AGC); and a controlled amplitude clipping module that assures samples are below a certain limit; and outputs the modified signal for playback through a loudspeaker or the like.
Abstract:
A method for restoring a processed speech signal by an electronic device is described. The method includes obtaining at least one audio signal. The method also includes performing bin-wise voice activity detection based on the at least one audio signal. The method further includes restoring the processed speech signal based on the bin-wise voice activity detection.
Abstract:
The present disclosure includes processing a signal to generate a first sub-set of data, transmitting the first sub-set of data for generation of a reconstructed audio signal, the reconstructed audio signal having a fidelity relative to the signal, processing the signal to generate a second sub-set of data and a third sub-set of data, the second sub-set of data defining a second portion of the signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the signal and comprising data that is different than data of the first and second sub-sets of data, comparing a priority of the second sub-set of data to a priority of the third sub-set of data, and transmitting one of the second sub-set of data and the third sub-set of data over the network for improving the fidelity.
Abstract:
An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
Abstract:
A method for adjusting a voice recognition system and a voice recognition system is disclosed, wherein the voice recognition system comprises a speaker and a microphone, and wherein the method comprises the steps of; - memorizing an audio frequency signal - playing back the audio frequency signal by means of the speaker, - generating a detection signal by detecting the audio frequency signal by means of the microphone, and - adjusting parameters of the voice recognition system dependent on the detection signal.