摘要:
A system for removing noise from an audio signal is described. For example, noise caused by content playing in the background during a voice command or phone call may be removed from the audio signal representing the voice command or phone call. By removing noise, the signal to noise ratio of the audio signal may be improved.
摘要:
Processes are described herein for transforming an audio mixture for which a specific component is affected by reverberation, into a specific dry component (i.e. unaffected by the reverberation) and a background component. In the process described herein, the long-term effects of reverberation are explicitly taken into account by modelling the spectrogram of the specific component as the result of a matrix convolution along time between the spectrogram of the specific dry component and a reverberation matrix. Parameters of the model are estimated iteratively by minimizing a cost-function measuring the divergence between the spectrogram of the mixture signal and the model of the spectrogram of the mixture signal.
摘要:
Systems and methods facilitating removal of content from audio files are described. A method includes identifying a sound recording in a first audio file, identifying a reference file having at least a defined level of similarity to the sound recording, and processing the first audio file to remove the sound recording and generate a second audio file. In some embodiments, winner-take-all coding and Hough transforms are employed for determining alignment and rate adjustment of the reference file in the first audio file. After alignment, the reference file is filtered in the frequency domain to increase similarity between the reference file and the sound recording. The frequency domain representation (FR) of the filtered version is subtracted from the FR first audio and the result converted to a time representation of the second audio file. In some embodiments, spectral subtraction is also performed to generate a further improved second audio file.
摘要:
The invention relates to a method and the associated device 1 for separating one or more particular digital audio source signals (si) contained in a mixed multichannel digital audio signal (smix) obtained by mixing a plurality of digital audio source signals (s1, . . . , sp). According to the invention: the modulus of the amplitude or the normalized power of the particular source signal(s) (si) is determined from representative values of said particular source signal(s) contained in the mixed signal; and then linearly constrained minimum variance spatial filtering is performed on the mixed signal in order to obtain each particular source signal (s′i), said filtering being based on the distribution of said particular source signal between at least two channels of the mixed signal, and the modulus of the amplitude or the normalized power of said particular source signal is used as a linear constraint of the filter.
摘要:
Systems and methods facilitating removal of content from audio files are described. A method includes identifying a sound recording in a first audio file, identifying a reference file having at least a defined level of similarity to the sound recording, and processing the first audio file to remove the sound recording and generate a second audio file. In some embodiments, winner-take-all coding and Hough transforms are employed for determining alignment and rate adjustment of the reference file in the first audio file. After alignment, the reference file is filtered in the frequency domain to increase similarity between the reference file and the sound recording. The frequency domain representation (FR) of the filtered version is subtracted from the FR first audio and the result converted to a time representation of the second audio file. In some embodiments, spectral subtraction is also performed to generate a further improved second audio file.
摘要:
The disclosure provides a method and an apparatus for detecting a voice activity in an input audio signal composed of frames. A noise characteristic of the input signal is determined based on a received frame of the input audio signal. A voice activity detection (VAD) parameter is derived based on the noise characteristic of the input audio signal using an adaptive function. The derived VAD parameter is compared with a threshold value to provide a voice activity detection decision. The input audio signal is processed according to the voice activity detection decision.
摘要:
In a method for coding of information for enhancing a background noise representation, voice activity of an input speech signal is determined. A noisiness parameter is determined for an inactive speech signal, wherein the noisiness parameter is based on a ratio of prediction gains of two Linear Predictive Coder (LPC) prediction filters with different orders. The noisiness parameter is quantized, and the quantized noisiness parameter is encoded for transmission.
摘要:
Acoustic noise suppression is provided in multiple-microphone systems using Voice Activity Detectors (VAD). A host system receives acoustic signals via multiple microphones. The system also receives information on the vibration of human tissue associated with human voicing activity via the VAD. In response, the system generates a transfer function representative of the received acoustic signals upon determining that voicing information is absent from the received acoustic signals during at least one specified period of time. The system removes noise from the received acoustic signals using the transfer function, thereby producing a denoised acoustic data stream.
摘要:
An apparatus comprising an input configured to receive an audio signal comprising at least two audio shots separated by an audio shot boundary, and a comparator configured to compare the audio signal against a reference audio signal and to determine a location of the audio shot boundary within the audio signal based on the comparison of the audio signal against the reference audio signal.
摘要:
The invention relates to a method and the associated device 1 for separating one or more particular digital audio source signals (si) contained in a mixed multichannel digital audio signal (smix) obtained by mixing a plurality of digital audio source signals (s1, . . . , sp). According to the invention: the modulus of the amplitude or the normalized power of the particular source signal(s) (si) is determined from representative values of said particular source signal(s) contained in the mixed signal; and then linearly constrained minimum variance spatial filtering is performed on the mixed signal in order to obtain each particular source signal (s′i), said filtering being based on the distribution of said particular source signal between at least two channels of the mixed signal, and the modulus of the amplitude or the normalized power of said particular source signal is used as a linear constraint of the filter.