摘要:
A method and system for separating noise from speech in real time is provided to improve intelligibility of speech for a variety of communications devices and hearing aids. From a speech signal, a plurality of frame-level features are extracted and form the input to the classifier. The classifier is a deep neural network comprising multiple hidden layers and an output layer with multiple output units. The classifier classifies the speech into a plurality of time-frequency units simultaneously. The classifier output constitutes an estimated ideal binary mask from which a fast gammatone filter bank is used to resynthesize the separated speech into an enhanced speech waveform.