一种基于KL散度的音频特征提取方法

发明公开

请登陆查看更多内容

专利标题： 一种基于KL散度的音频特征提取方法
专利标题（英）： Audio feature extraction method based on KL divergence
申请号： CN201810930863.7

申请日： 2018-08-15
公开(公告)号： CN109036382A

公开(公告)日： 2018-12-18
发明人: 杨玉红 , 张会玉 , 冯佳倩 , 胡瑞敏 , 艾浩军 , 涂卫平 , 王晓晨
申请人： 武汉大学
申请人地址： 湖北省武汉市武昌区珞珈山武汉大学
专利权人： 武汉大学
当前专利权人： 武汉大学
当前专利权人地址： 湖北省武汉市武昌区珞珈山武汉大学
代理机构： 武汉科皓知识产权代理事务所
代理商 魏波
主分类号： G10L15/02
IPC分类号： G10L15/02 ; G10L25/21 ; G10L25/30

摘要：

本发明公开了一种基于KL散度的音频特征提取方法，包括：按类读取训练集中的音频文件，然后转化成对应的功率谱，计算得到训练集中不同声学场景的类间KL散度矩阵，划分KL滤波器的频率群，设计出KL滤波器；将训练集的功率谱通过KL滤波器，提取出基于类间KL散度的频谱特征；将KL频谱特征取对数并归一化后输入到卷积神经网络进行训练得到声学模型；读取测试集的音频文件，然后转化成对应的功率谱，通过KL滤波器提取出测试集对应的KL频谱特征；将测试集的KL频谱特征输入训练好的声学模型进行测试和评估，得到最终的声场景分类模型准确率。本发明应用到其他声场景分类事件中，可得到比传统的基于人耳的Mel特征提取方法更好的性能。

摘要（英）：

The invention discloses an audio feature extraction method based on KL divergence. The method includes steps: reading audio files in a training set according to classes, converting the audio files tocorresponding power spectrums, obtaining an inter-class KL divergence matrix of different acoustic scenes in the training set through calculation, dividing frequency groups of a KL filter, and designing the KL filter; enabling the power spectrums of the training set to pass through the KL filter, and extracting frequency spectrum features based on the inter-class KL divergence; taking logarithms of the KL frequency spectrum features, performing normalization, and inputting the normalized features to a convolutional neural network for training to obtain an acoustic model; reading audio files ofa test set, converting the audio files to corresponding power spectrums, and extracting KL frequency spectrum features corresponding to the test set through the KL filter; and inputting the KL frequency spectrum features of the test set to the trained acoustic model for testing and assessment, and obtaining the final accuracy of the acoustic scene classification model. The method is applied to other acoustic scene classification events, and better performance compared with conventional Mel feature extraction method based on human ears can be obtained.

公开/授权文献

CN109036382B 一种基于KL散度的音频特征提取方法公开/授权日：2020-06-09

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/02	.语音识别的特征提取；识别单位的选择