Morphological pure speech detection using valley percentage

发明授权

US06205422B1 Morphological pure speech detection using valley percentage 有权

标题翻译：形态纯语音检测使用谷百分比

请登陆查看更多内容

专利标题： Morphological pure speech detection using valley percentage
专利标题（中）： 形态纯语音检测使用谷百分比
申请号： US09201705

申请日： 1998-11-30
公开(公告)号： US06205422B1

公开(公告)日： 2001-03-20
发明人: Chuang Gu , Ming-Chieh Lee , Wei-ge Chen
申请人： Chuang Gu , Ming-Chieh Lee , Wei-ge Chen
主分类号： G10L1102
IPC分类号： G10L1102

Morphological pure speech detection using valley percentage

摘要：

A human speech detection method detects pure-speech signals in an audio signal containing a mixture of pure-speech and non-speech or mixed-speech signals. The method accurately detects the pure-speech signals by computing a novel Valley Percentage feature from the audio signal and then classifying the audio signals into pure-speech and non-speech (or mixed-speech) classifications. The Valley Percentage is a measurement of the low energy parts of the audio signal (the valley) in comparison to the high energy parts of the audio signal (the mountain). To classify the audio signal, the method performs a threshold decision on the value of the Valley Percentage. Using a binary mask, a high Valley Percentage is classified as pure-speech and a low Valley Percentage is classified as non-speech (or mixed-speech). The method further employs morphological filters to improve the accuracy of human speech detection. Before detection, a morphological closing filter may be employed to eliminate unwanted noise from the audio signal. After detection, a combination of morphological closing and opening filters may be employed to remove aberrant pure-speech and non-speech classifications from the binary mask resulting from impulsive audio signals in order to more accurately detect the boundaries between the pure-speech and non-speech portions of the audio signal. A number of parameters may be employed by the method to further improve the accuracy of human speech detection. For implementation in supervised digital audio signal applications, these parameters may be optimized by training the application a priori. For implementation in an unsupervised environment, adaptive determination of these parameters is also possible.

摘要（中）：

人类语音检测方法检测包含纯语音和非语音或混合语音信号的混合的音频信号中的纯语音信号。该方法通过从音频信号计算出新颖的谷百分比特征，然后将音频信号分类为纯语音和非语音（或混合语音）分类，从而准确地检测纯语音信号。与音频信号（山）的高能部分相比，谷百分比是音频信号（谷）的低能量部分的测量。为了对音频信号进行分类，该方法对谷百分比的值执行阈值判定。使用二进制面具，高谷百分比被归类为纯言语，低谷百分比被归类为非言语（或混合语音）。该方法还采用形态滤波器来提高人类语音检测的准确性。在检测之前，可以使用形态闭合滤波器来消除来自音频信号的不需要的噪声。在检测之后，可以采用形态闭合和打开滤波器的组合来从脉冲音频信号产生的二进制掩码中去除异常纯语音和非语音分类，以更准确地检测纯语音和非语音分类之间的边界。音频信号的语音部分。通过该方法可以采用多个参数来进一步提高人类语音检测的准确性。为了在监督数字音频信号应用中实现，可以先通过训练应用来优化这些参数。为了在无监督的环境中实现，这些参数的自适应确定也是可能的。

信息查询

Espacenet