Voice activity segmentation device, voice activity segmentation method, and voice activity segmentation program

发明授权

US09293131B2 Voice activity segmentation device, voice activity segmentation method, and voice activity segmentation program 有权

标题翻译：语音活动分段设备，语音活动分割方法和语音活动分割程序

请登陆查看更多内容

专利标题： Voice activity segmentation device, voice activity segmentation method, and voice activity segmentation program
专利标题（中）： 语音活动分段设备，语音活动分割方法和语音活动分割程序
申请号： US13814141

申请日： 2011-08-02
公开(公告)号： US09293131B2

公开(公告)日： 2016-03-22
发明人: Takayuki Arakawa , Daisuke Tanaka
申请人： Takayuki Arakawa , Daisuke Tanaka
申请人地址： JP Tokyo
专利权人： NEC CORPORATION
当前专利权人： NEC CORPORATION
当前专利权人地址： JP Tokyo
优先权： JP2010-179180 20100810
国际申请： PCT/JP2011/068003 WO 20110802
国际公布： WO2012/020717 WO 20120216
主分类号： G10L15/20
IPC分类号： G10L15/20 ; G10L15/04 ; G10L25/87 ; G10L25/78

Voice activity segmentation device, voice activity segmentation method, and voice activity segmentation program

摘要：

Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program.The voice activity segmentation device comprises: a first voice activity segmentation means for determining a voice-active segment (first voice-active segment) and a voice-inactive segment (first voice-inactive segment) in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; a second voice activity segmentation means for determining, after a reference speech acquired from a reference speech storage means has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and a threshold value update means for updating the threshold value in such a way that a discrepancy rate between the determination result of the second voice activity segmentation means and a correct segmentation calculated from the reference speech is decreased.

摘要（中）：

提供了一种噪声鲁棒的语音活动分段装置，其更新用于确定语音活动段的参数，而不会对用户造成负担，并且还提供了语音活动分割方法和语音活动分段程序。语音活动分割装置包括：第一语音活动分段装置，用于通过比较来确定输入声音的时间序列中的语音活动段（第一语音活动段）和语音不活动段（第一语音无效段）输入声音的时间序列的阈值和特征值; 第二语音活动分割装置，用于在从参考语音存储装置获取的参考语音叠加在所述第一语音无效段的时间序列上之后，确定所述时间中的语音活动段和语音无效段 - 通过将阈值与叠加的第一语音无效段的时间序列的特征值进行比较，来叠加第一语音无效段的系列; 以及阈值更新装置，用于以使得第二语音活动分段装置的确定结果与从参考语音计算的正确分割之间的差异率减小的方式更新阈值。

公开/授权文献

US20130132078A1 VOICE ACTIVITY SEGMENTATION DEVICE, VOICE ACTIVITY SEGMENTATION METHOD, AND VOICE ACTIVITY SEGMENTATION PROGRAM 公开/授权日：2013-05-23

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/20	.专门适用于不利环境（例如，噪音环境）中保持鲁棒性或增强语音强度的语音识别技术（G10L21/02优先）