摘要:
Systems and methods for noise reduction are provided including operations for noisy speech signals, such as speech signals that are subject to speech processing, speech recognition and speech transmission for voice communication purposes. In one embodiment, a system for noise suppression includes an input smoothing filter to smooth magnitudes of the input spectrum, a desired noise shape determination block configured to determine a desired noise shape of the noise spectrum dependent on the smoothed-magnitude input spectrum, and a suppression factors determination block configured to determine a set of suppression factors based on the desired noise shape and the smoothed-magnitude input spectrum. In one embodiment, a filter coefficient determination block is configured to determine noise suppression filter coefficients from the desired noise shape of the noise spectrum. Embodiments are also directed to systems and methods for noise reduction. System configurations and processes are provided for formant detection.
摘要:
An audio processing system is provided with a speaker, a plurality of microphones, and an audio processing device. The audio processing device includes a plurality of filters that allow audio signals of audio collected by the plurality of microphones to pass any respective first bands included in a band of the audio output from the speaker, a plurality of delayers that delay the audio signals passed through the plurality of filters by delay times corresponding to the first bands respectively, a correlation value calculator that calculates a correlation value of a plurality of audio signals delayed respectively by the plurality of delayers and an audio signal of the audio output from the speaker, and a determinator that determines presence or absence of abnormality in the plurality of microphones and the speaker based on the correlation value.
摘要:
Digital signal processing techniques for automatically reducing audible noise from a sound recording that contains speech. A noise suppression system uses two types of noise estimators, including a more aggressive one and less aggressive one. Decisions are made on how to select or combine their outputs into a usable noise estimate in a different speech and noise conditions. A 2-channel noise estimator is described. Other embodiments are also described and claimed.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
摘要:
System and method for analyzing audio data are provided. The audio data may be analyzed7 to identify language register. For example, the audio data may be analyzed to identify language register of a selected speaker, such as the language register of a wearer of a wearable audio sensor, of a speaker engaged in conversation with the wearer of the wearable audio sensor, and so forth. For example, the audio data may be analyzed to obtain textual information, and the textual information may be analyzed to identify the language register. Feedbacks and reports may be provided based on the identified language register.
摘要:
System and method for analyzing audio data are provided. The audio data may be analyzed to detect articulation errors. For example, the audio data may be analyzed to detect articulation errors of a selected speaker, such as articulation errors of a wearer of a wearable audio sensor, of a speaker engaged in conversation with the wearer of the wearable audio sensor, and so forth. Feedbacks and reports may be provided based on the detected articulation errors.
摘要:
A noise reduction apparatus according to the present invention includes: a sudden sound information storage unit that stores an input signal that are input before a current input signal is input as sudden sound information, the input signal having a signal level of voice components equal to or smaller than a predetermined threshold and including a sudden sound to be suppressed; a phase difference calculation unit that calculates a phase difference between the sudden sound information and a sudden sound in the current input signal based on a maximum value of a correlation value between the sudden sound information and the current input signal; an addition signal generation unit that shifts a phase of the sudden sound information based on the phase difference to generate an addition signal; and a sudden sound suppression unit that adds the addition signal and the current input signal to output an output signal.
摘要:
A method and apparatus for enhancing modulation of certain speech sounds, such as trill sounds, are provided for radios which utilize digital vocoders. A digitized speech stream is sampled and the sampling is adjusted to determine, detect and enhance trill nulls in the digitized voice stream by one or more of: frame shifting the digitized speech input stream prior to vocoding, time expanding a digitized speech steam prior to vocoding, time compressing a digitized speech output stream after vocoding, and/or modulation enhancement and filtering of the a digitized speech output stream after vocoding.
摘要:
Methods and systems for source separation based on determining a number of bases for a nonnegative matrix factorization (NMF) model are disclosed. A method includes receiving, at a computing device, a mixed signal including a combination of first signal data and second signal data. The method also includes generating, by the computing device, a time-frequency representation of the mixed signal. The method further includes determining, by applying a structured stochastic variational inference (SSVI) algorithm to the NMF model, a number of bases for a dictionary of signal-related components of the mixed signal. The method uses the number of bases and the time-frequency representation to construct the dictionary and an activation matrix of weights, the weights indicating how active each one of the signal-related components is at a given time. The method then uses the dictionary and the activation matrix to separate the first signal data from the second signal data.
摘要:
The present document relates to a voice activity detection (VAD) method and methods used for voice activity detection and apparatus thereof, the VAD method includes: obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature and a spectral centroid feature of the current frame according to the sub-band signals; computing a signal to noise ratio parameter of the current frame according to a background noise energy estimated from a previous frame, an energy of SNR sub-bands and a energy feature of the current frame; computing a VAD decision result according to a tonality signal flag, a signal to noise ratio parameter, a spectral centroid feature, and a frame energy feature. The methods and apparatus of the present document can improve the accuracy of non-stationary noise (such as office noise) and music detection.