摘要:
The present invention relates to a method for recovering target speech from mixed signals, which include the target speech and noise observed in a real-world environment, based on split spectra using sound sources' locational information. This method includes: the first step of receiving target speech from a target speech source and noise from a noise source and forming mixed signals of the target speech and the noise at a first microphone and at a second microphone; the second step of performing the Fourier transform of the mixed signals from a time domain to a frequency domain, decomposing the mixed signals into two separated signals UA and UB by use of the Independent Component Analysis, and, based on transmission path characteristics of the four different paths from the target speech source and the noise source to the first and second microphones, generating from the separated signal UA a pair of split spectra vA1 and vA2, which were received at the first and second microphones respectively, and from the separated signal UB another pair of split spectra vB1 and vB2, which were received at the first and second microphones respectively; and the third step of extracting a recovered spectrum of the target speech, wherein the split spectra are analyzed by applying criteria based on sound transmission characteristics that depend on the four different distances between the first and second microphones and the target speech and noise sources, and performing the inverse Fourier transform of the recovered spectrum from the frequency domain to the time domain to recover the target speech.
摘要翻译:本发明涉及一种基于使用声源定位信息的分割频谱从混合信号中恢复目标语音的方法,其包括在现实环境中观察到的目标语音和噪声。 该方法包括:从目标语音源接收目标语音和来自噪声源的噪声的第一步骤,并在第一麦克风和第二麦克风处形成目标语音和噪声的混合信号; 执行从时域到频域的混合信号的傅里叶变换的第二步骤,通过使用将混合信号分解成两个分离的信号U SUB和U B 的独立分量分析,并且基于从目标语音源和噪声源到第一和第二麦克风的四个不同路径的传输路径特性,从分离的信号U A A生成一对 分别在第一和第二麦克风处接收的分离光谱v A1和v A2,以及分离的信号U B B另一对 分别在第一和第二麦克风接收的分离光谱v B1和V B2; 以及提取目标语音的恢复频谱的第三步骤,其中通过应用基于取决于第一和第二麦克风与目标语音和噪声源之间的四个不同距离的声音传输特性的标准来分析分离频谱,以及 从频域到时域执行恢复频谱的傅里叶逆变换,以恢复目标语音。
摘要:
Method for recovering target speech by extracting signal components falling in a speech segment, which is determined based on separated signals obtained through the Independent Component Analysis, thereby minimizing the residual noise in the recovered target speech. The present method comprises: the first step of receiving target speech emitted from a sound source and a noise emitted from another sound source and extracting estimated spectra Y* corresponding to the target speech by use of the Independent Component Analysis; the second step of separating from the estimated spectra Y* an estimated spectrum series group y* in which the noise is removed by applying separation judgment criteria based on the kurtosis of the amplitude distribution of each of estimated spectrum series in Y*; the third step of detecting a speech segment and a noise segment of the total sum F of all the estimated spectrum series in y* by applying detection judgment criteria based on a predetermined threshold value T that is determined by the maximum value of F; and the fourth step of extracting components falling in the speech segment from the estimated spectra Y* to generate a recovered spectrum group of the target speech for recovering the target speech.
摘要:
Method for recovering target speech by extracting signal components falling in a speech segment, which is determined based on separated signals obtained through the Independent Component Analysis, thereby minimizing the residual noise in the recovered target speech. The present method comprises: the first step of receiving target speech emitted from a sound source and a noise emitted from another sound source and extracting estimated spectra Y* corresponding to the target speech by use of the Independent Component Analysis; the second step of separating from the estimated spectra Y* an estimated spectrum series group y* in which the noise is removed by applying separation judgment criteria based on the kurtosis of the amplitude distribution of each of estimated spectrum series in Y*; the third step of detecting a speech segment and a noise segment of the total sum F of all the estimated spectrum series in y* by applying detection judgment criteria based on a predetermined threshold value T that is determined by the maximum value of F; and the fourth step of extracting components falling in the speech segment from the estimated spectra Y* to generate a recovered spectrum group of the target speech for recovering the target speech.
摘要:
The present invention provides a method for recovering target speech based on shapes of amplitude distributions of split spectra obtained by use of blind signal separation. This method includes: a first step of receiving target speech emitted from a sound source and a noise emitted from another sound source and forming mixed signals of the target speech and the noise at a first microphone and at a second microphone; a second step of performing the Fourier transform of the mixed signals from the time domain to the frequency domain, decomposing the mixed signals into two separated signals U1 and U2 by use of the Independent Component Analysis, and, based on transmission path characteristics of the four different paths from the two sound sources to the first and second microphones, generating the split spectra v11, v12, v21 and v22 from the separated signals U1 and U2; and a third step of extracting estimated spectra Z* corresponding to the target speech to generate a recovered spectrum group of the target speech, wherein the split spectra v11, v12, v21, and v22 are analyzed by applying criteria based on the shape of the amplitude distribution of each of the split spectra v11, v12, v21, and v22, and performing the inverse Fourier transform of the recovered spectrum group from the frequency domain to the time domain to recover the target speech.
摘要:
The present invention provides a method for recovering target speech based on shapes of amplitude distributions of split spectra obtained by use of blind signal separation. This method includes: a first step of receiving target speech emitted from a sound source and a noise emitted from another sound source and forming mixed signals of the target speech and the noise at a first microphone and at a second microphone; a second step of performing the Fourier transform of the mixed signals from the time domain to the frequency domain, decomposing the mixed signals into two separated signals U1 and U2 by use of the Independent Component Analysis, and, based on transmission path characteristics of the four different paths from the two sound sources to the first and second microphones, generating the split spectra v11, v12, v21 and v22 from the separated signals U1 and U2; and a third step of extracting estimated spectra Z* corresponding to the target speech to generate a recovered spectrum group of the target speech, wherein the split spectra v11, v12, v21, and v22 are analyzed by applying criteria based on the shape of the amplitude distribution of each of the split spectra v11, v12, v21, and v22, and performing the inverse Fourier transform of the recovered spectrum group from the frequency domain to the time domain to recover the target speech.