摘要:
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.
摘要:
A method and apparatus classify a portion of an alternative sensor signal as either containing noise or not containing noise. The portions of the alternative sensor signal that are classified as containing noise are not used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor. The portions of the alternative sensor signal that are classified as not containing noise are used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor.
摘要:
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.
摘要:
A method and apparatus classify a portion of an alternative sensor signal as either containing noise or not containing noise. The portions of the alternative sensor signal that are classified as containing noise are not used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor. The portions of the alternative sensor signal that are classified as not containing noise are used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor.
摘要:
A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.
摘要:
A synthesized alternative sensor signal is produced from an alternative sensor signal. The synthesized alternative sensor signal is computed using vocal tract resonances estimated based on the alternative sensor signal, and using a waveform synthesis technique that converts the estimated vocal tract resonance sequence into a spectral magnitude sequence. The synthesized alternative sensor signal and the alternative sensor signal are used to estimate a clean speech value.
摘要:
A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.
摘要:
A synthesized alternative sensor signal is produced from an alternative sensor signal. The synthesized alternative sensor signal is computed using vocal tract resonances estimated based on the alternative sensor signal, and using a waveform synthesis technique that converts the estimated vocal tract resonance sequence into a spectral magnitude sequence. The synthesized alternative sensor signal and the alternative sensor signal are used to estimate a clean speech value.
摘要:
A noisy audio signal, with user input device noise, is received. Particular frames in the audio signal that are corrupted by user input device noise are identified and removed. The removed audio data is then reconstructed to obtain a clean audio signal.
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for graph-based semi-supervised learning of structured tagging models. In one aspect, a method includes creating a graph having a plurality of unique vertices in which vertices in a first set of vertices represent n-grams that are each associated with a respective part-of-speech and that were derived from labeled source domain text, and in which vertices in a different second set of vertices represent n-grams that are not associated with a part-of-speech and that were derived from unlabeled target domain text. A respective measure of similarity is calculated between the vertices in each of the pairs based at least partially on a distance between the respective features of the pair used to weight a graph edge between the pair.