Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction

发明申请

US20060200351A1 Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction 有权

标题翻译：使用语音合成和还原的双向目标滤波模型进行语音识别的两阶段实现

请登陆查看更多内容

专利标题： Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction
专利标题（中）： 使用语音合成和还原的双向目标滤波模型进行语音识别的两阶段实现
申请号： US11069474

申请日： 2005-03-01
公开(公告)号： US20060200351A1

公开(公告)日： 2006-09-07
发明人: Alejandro Acero , Dong Yu , Li Deng
申请人： Alejandro Acero , Dong Yu , Li Deng
申请人地址： US WA Redmond
专利权人： Microsoft Corporation
当前专利权人： Microsoft Corporation
当前专利权人地址： US WA Redmond
主分类号： G10L15/04
IPC分类号： G10L15/04

Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction

摘要：

A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values. This then permits the use of the model for phonetic recognition.

摘要（中）：

用新的两阶段实现来描述语音合成和简化的结构化生成模型。在第一阶段，使用电话序列中共振目标的先前信息产生共振峰或声道共振（VTR）的动力学。具有有限脉冲响应（FIR）的双向时间滤波作为FIR滤波器的输入应用于分段目标序列。在第二阶段，基于FIR滤波的VTR目标，分析地预测语音cepstra的动力学。这两个阶段的组合系统因此产生相关和因果相关的VTR和倒谱动力学，其中语音减少在隐藏共振空间中明确表示，并且隐含地在观察到的倒频谱空间中。组合系统还给出了电话序列的声学观察概率。使用这种概率，可以根据它们各自的概率值对不同的电话序列进行比较和排序。这样就允许使用模型进行语音识别。

公开/授权文献

US07409346B2 Two-stage implementation for phonetic recognition using a bi-directional target-filtering model of speech coarticulation and reduction 公开/授权日：2008-08-05

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/04	.分段；字极限检测