基于循环神经网络语音识别中语音数据增强方法及装置

Invention Application

WO2019024008A1 基于循环神经网络语音识别中语音数据增强方法及装置审中-公开

Please log in to see more content

Patent Title: 基于循环神经网络语音识别中语音数据增强方法及装置
Patent Title (English): VOICE DATA ENHANCING METHOD AND DEVICE IN VOICE RECOGNITION BASED ON RECURRENT NEURAL NETWORK
Application No.: PCT/CN2017/095668

Application Date: 2017-08-02
Publication No.: WO2019024008A1

Publication Date: 2019-02-07
Inventor: 赵媛媛 , 徐爽 , 徐波
Applicant: 中国科学院自动化研究所
Applicant Address: 中国北京市海淀区中关村东路95号, Beijing 100190 CN
Assignee: 中国科学院自动化研究所
Current Assignee: 中国科学院自动化研究所
Current Assignee Address: 中国北京市海淀区中关村东路95号, Beijing 100190 CN
Agency: 北京瀚仁知识产权代理事务所(普通合伙)
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L15/06 ; G10L15/02 ; G10L15/20

Abstract:

语音识别处理领域的一种基于循环神经网络的语音数据增强的方法，旨在解决循环神经网络在语音识别中由于模拟语音识别中不规则语法现象引起的过度建模词间依赖的问题。方法包括：从输入的语音数据中提取标识语音的各个频率能量值的声学特征，生成声学特征向量（201）；根据预设的标注文件和声学特征向量获得语音数据的语句标签序列（202）；通过决策聚类预设的标注文件和语句标签序列获得决策聚类操作后的对齐文件（203）；生成一个[0,1]之间的第一随机数γ，并与预设的调整比例α比较（204）；如果第一随机数γ大于调整比例α，在边界文件所指示的位置对上述语音数据进行增强处理（205）。能够快速、方便地增加训练数据中不规则的口语化现象。

Abstract(English):

A voice data enhancing method based on a recurrent neural network in the field of voice recognition processing aims at solving the problem of excessive modeling word dependence caused by irregular grammar phenomena of voice recognition simulation in voice recognition in a recurrent neural network. The method comprises: extracting acoustic features of various frequency energy values identifying voice from input voice data to generate acoustic feature vectors (201); obtaining a statement label sequence of the voice data according to a preset labeling file and the acoustic feature vectors (202); obtaining an alignment file after a decision cluster operation by means of the labeling file preset by a decision cluster, and the statement label sequence (203); generating a first random number γ between [0, 1], and comparing the first random number with a preset adjusting proportion α (204); and if the first random number γ is greater than the adjusting proportion α, performing enhancement processing on the voice data in a position indicated by a boundary file (205). The method enables irregular spoken language phenomena in training data to be increased quickly and conveniently.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络