一种基于嵌入时延神经网络的高斯混合模型的说话人识别方法

发明公开

CN102034472A 一种基于嵌入时延神经网络的高斯混合模型的说话人识别方法无效 - 撤回

请登陆查看更多内容

专利标题： 一种基于嵌入时延神经网络的高斯混合模型的说话人识别方法
专利标题（英）： Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
申请号： CN200910035424.0

申请日： 2009-09-28
公开(公告)号： CN102034472A

公开(公告)日： 2011-04-27
发明人: 戴红霞 , 王吉林 , 余华 , 魏昕 , 赵力
申请人： 戴红霞 , 王吉林 , 余华
申请人地址： 江苏省无锡市月秀花园39号301室
专利权人： 戴红霞,王吉林,余华
当前专利权人： 戴红霞,王吉林,余华
当前专利权人地址： 江苏省无锡市月秀花园39号301室
主分类号： G10L15/00
IPC分类号： G10L15/00

摘要：

本发明公开了一种基于嵌入时延神经网络的高斯混合模型的说话人识别方法，本发明充分考虑了TDNN和GMM各自的优点，把TDNN嵌入到GMM中，TDNN充分利用了输入特征向量的时序性，并且通过时延网络的变换，求得TDNN输入和输出向量的残差，将该残差通过最大期望方法修正GMM的训练；此外利用修正后的GMM模型参数和残差得到一个似然概率，利用带惯性的向后反演方法修正TDNN参数，从而使得GNN和TDNN的参数交替更新。实验表明，采用本发明的方法在各种信噪比的情况下识别率都比基线GMM有所提高。

摘要（英）：

The invention discloses a speaker recognition method based on a Gaussian mixture model (GMM) embedded with a time delay neural network (TDNN). In the speaker recognition method, the advantages of the TDNN and the GMM are fully considered, the TDNN is embedded into the GMM, and solves a residual of input and output vectors of the TDNN by fully utilizing the time sequence of an input characteristic vector through the conversion of a time delay network, and the residual modifies the training of the GMM through an expectation maximization method; besides, a likelihood probability is acquired by a modified GMM model parameter and the residual, and a TDNN parameter is modified by an inertial backward inversion method so as to ensure that parameters of the GMM and the TDNN are alternately updated. An experiment shows that: a recognition rate of the method is improved to a certain extent compared with that of a baseline GMM under various signal to noise ratios.

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）