摘要:
Processes are described herein for transforming an audio mixture for which a specific component is affected by reverberation, into a specific dry component (i.e. unaffected by the reverberation) and a background component. In the process described herein, the long-term effects of reverberation are explicitly taken into account by modelling the spectrogram of the specific component as the result of a matrix convolution along time between the spectrogram of the specific dry component and a reverberation matrix. Parameters of the model are estimated iteratively by minimizing a cost-function measuring the divergence between the spectrogram of the mixture signal and the model of the spectrogram of the mixture signal.
摘要:
Systems and methods facilitating removal of content from audio files are described. A method includes identifying a sound recording in a first audio file, identifying a reference file having at least a defined level of similarity to the sound recording, and processing the first audio file to remove the sound recording and generate a second audio file. In some embodiments, winner-take-all coding and Hough transforms are employed for determining alignment and rate adjustment of the reference file in the first audio file. After alignment, the reference file is filtered in the frequency domain to increase similarity between the reference file and the sound recording. The frequency domain representation (FR) of the filtered version is subtracted from the FR first audio and the result converted to a time representation of the second audio file. In some embodiments, spectral subtraction is also performed to generate a further improved second audio file.
摘要:
A noise elimination circuit of particular application in enhancing vocal clarity in a teleconference includes a first voice processing circuit, a second voice processing circuit, and a subtracter. The first voice processing circuit receives and processes a first voice from a first microphone and the second voice processing circuit receives and processes the same voice from a second microphone (second voice). The first voice and the second voice include voice signals and noises. The subtracter is electrically connected to the two voice processing circuits to receive the first voice and the second voice respectively processed by the first voice processing circuit and the second voice processing circuit. The subtracter substracts the second voice from the first voice, and outputs a clear voice from which noise has been eliminated.
摘要:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
摘要:
A feature compensation apparatus includes a feature extractor configured to extract corrupt speech features from a corrupt speech signal with additive noise that consists of two or more frames; a noise estimator configured to estimate noise features based on the extracted corrupt speech features and compensated speech features; a probability calculator configured to calculate a correlation between adjacent frames of the corrupt speech signal; and a speech feature compensator configured to generate compensated speech features by eliminating noise features of the extracted corrupt speech features while taking into consideration the correlation between adjacent frames of the corrupt speech signal and the estimated noise features, and to transmit the generated compensated speech features to the noise estimator.
摘要:
Systems and methods facilitating removal of content from audio files are described. A method includes identifying a sound recording in a first audio file, identifying a reference file having at least a defined level of similarity to the sound recording, and processing the first audio file to remove the sound recording and generate a second audio file. In some embodiments, winner-take-all coding and Hough transforms are employed for determining alignment and rate adjustment of the reference file in the first audio file. After alignment, the reference file is filtered in the frequency domain to increase similarity between the reference file and the sound recording. The frequency domain representation (FR) of the filtered version is subtracted from the FR first audio and the result converted to a time representation of the second audio file. In some embodiments, spectral subtraction is also performed to generate a further improved second audio file.
摘要:
In a method for coding of information for enhancing a background noise representation, voice activity of an input speech signal is determined. A noisiness parameter is determined for an inactive speech signal, wherein the noisiness parameter is based on a ratio of prediction gains of two Linear Predictive Coder (LPC) prediction filters with different orders. The noisiness parameter is quantized, and the quantized noisiness parameter is encoded for transmission.
摘要:
Acoustic noise suppression is provided in multiple-microphone systems using Voice Activity Detectors (VAD). A host system receives acoustic signals via multiple microphones. The system also receives information on the vibration of human tissue associated with human voicing activity via the VAD. In response, the system generates a transfer function representative of the received acoustic signals upon determining that voicing information is absent from the received acoustic signals during at least one specified period of time. The system removes noise from the received acoustic signals using the transfer function, thereby producing a denoised acoustic data stream.
摘要:
The application relates to a hearing device comprising a) an input unit for delivering a time varying electric input signal representing an audio signal comprising at least two sound sources, b) a cyclic analysis buffer unit of length A adapted for storing the last A audio samples, c) a cyclic synthesis buffer unit of length, where L is smaller than A, adapted for storing the last L audio samples, which are intended to be separated in individual sound sources, d) a database having stored recorded sound examples from said at least two sound sources, each entry in the database being termed an atom, the atoms originating from audio samples from first and second buffers corresponding in size to said synthesis and analysis buffer units, where for each atom, the audio samples from the first buffer overlaps with the audio samples from the second buffer, and where atoms originating from the first buffer constitute a reconstruction dictionary, and where atoms originating from the second buffer constitute an analysis dictionary. The application further relates to a method of separating audio sources, and e) a sound source separation unit for separating said electric input signal to provide separated signals representing said at least two sound sources, the sound source separation unit being configured to determine the most optimal representation (W) of the last A samples given the atoms in the analysis dictionary of the database, and to generate said at least two sound sources by combining atoms in the reconstruction dictionary of the database using the optimal representation (W). The invention may e.g. be used for hearing devices, e.g. hearing aids, headsets, ear phones, active ear protection systems, handsfree telephone systems, mobile telephones, teleconferencing systems, public address systems, classroom amplification systems, etc.
摘要:
Signal processing solutions take advantage of microphones located on different devices and improve the quality of transmitted voice signals in a communication system. With usage of various devices such as Bluetooth headsets, wired headsets and the like in conjunction with mobile handsets, multiple microphones located on different devices are exploited for improving performance and/or voice quality in a communication system. Audio signals are recorded by microphones on different devices and processed to produce various benefits, such as improved voice quality, background noise reduction, voice activity detection and the like.