一种基于深度学习技术的自动口音分类方法及装置

发明公开

CN105632501A 一种基于深度学习技术的自动口音分类方法及装置失效 - 权利终止

请登陆查看更多内容

专利标题： 一种基于深度学习技术的自动口音分类方法及装置
专利标题（英）： Deep-learning-technology-based automatic accent classification method and apparatus
申请号： CN201511021329.7

申请日： 2015-12-30
公开(公告)号： CN105632501A

公开(公告)日： 2016-06-01
发明人: 刘文举 , 陈明明 , 张邯平 , 高鹏 , 董理科 , 刘晓飞 , 乔利玮 , 王桐
申请人： 中国科学院自动化研究所 , 国网山西省电力公司电力科学研究院 , 山西振中电力股份有限公司
申请人地址： 北京市海淀区中关村东路95号
专利权人： 中国科学院自动化研究所,国网山西省电力公司电力科学研究院,山西振中电力股份有限公司
当前专利权人： 中国科学院自动化研究所,国网山西省电力公司电力科学研究院,山西振中电力股份有限公司
当前专利权人地址： 北京市海淀区中关村东路95号
代理机构： 中科专利商标代理有限责任公司
代理商 宋焰琴
主分类号： G10L15/32
IPC分类号： G10L15/32 ; G10L25/24 ; G10L15/16

摘要：

本发明公开了一种基于深度学习技术的自动口音分类方法和装置，方法包括：对训练集中的所有带口音语音进行去除静音并提取MFCC特征；根据所提取的MFCC特征训练各种带口音语音的深层神经网络，以描述各种带口音语音的声学特性，其中所述深层神经网络指至少包含两个隐层的前向人工神经网络；计算待识别语音中各语音帧在深层神经网络上的各口音分类的概率得分，将概率得分最大的口音类别标签置为该语音帧的口音类别标签；使用待识别语音中的每个语音帧的口音类别进行多数投票，得到待识别语音相对应的口音类别。本发明可以有效利用上下文信息，从而可以提供比传统浅层模型更好的分类效果。

摘要（英）：

The invention discloses a deep-learning-technology-based automatic accent classification method and apparatus. The method comprises: mute voice elimination is carried out on all accent voices in a training set and mel-frequency cepstrum coefficient (MFCC) feature extraction is carried out; according to the extracted MFCC feature, deep neural networks of various accent voices are trained to describe acoustic characteristics of various accent voices, wherein the deep neural networks are forward artificial neural networks at least including two hidden layers; probability scores of all voice frames of a to-be-identified voice at all accent classifications in the deep neural networks are calculated and an accent classification tag with the largest probability score is set as a voice identification tag of the voice frame; and the voice classification of each voice frame in the to-be-identified voice is used for carrying out majority voting to obtain a voice classification corresponding to the to-be-identified voice. According to the invention, context information can be utilized effectively and thus a classification effect better than a traditional superficial layer model can be provided.

公开/授权文献

CN105632501B 一种基于深度学习技术的自动口音分类方法及装置公开/授权日：2019-09-03

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/28	.语音识别系统的结构细节
G10L15/32	..以顺序或并行使用的多个识别器；相应的记分组合系统，例如投票系统