一种基于数据增强的声场景辨识方法

发明公开

请登陆查看更多内容

专利标题： 一种基于数据增强的声场景辨识方法
专利标题（英）： Acoustic scene identification method based on data enhancement
申请号： CN201910201430.2

申请日： 2019-03-18
公开(公告)号： CN109978034A

公开(公告)日： 2019-07-05
发明人: 李艳雄 , 张聿晗 , 王武城 , 刘名乐
申请人： 华南理工大学
申请人地址： 广东省广州市天河区五山路381号
专利权人： 华南理工大学
当前专利权人： 华南理工大学
当前专利权人地址： 广东省广州市天河区五山路381号
代理机构： 广州市华学知识产权代理有限公司
代理商 李斌
主分类号： G06K9/62
IPC分类号： G06K9/62 ; G10L21/0208 ; G10L25/03 ; G10L25/27 ; G10L25/45

摘要：

本发明公开了一种基于数据增强的声场景辨识方法，包括下列步骤：首先采集并标注不同声场景的音频样本；然后预处理，对音频样本进行预加重、分帧和加窗处理；接着进行数据增强，提取各音频样本的谐波源和冲击源，得到更充足的音频样本，从音频样本及其谐波源和冲击源中提取对数梅尔滤波器组特征，再将上述三个特征堆叠成一个三通道的高维特征，接着采用混合增强技术构造更丰富的训练样本；最后将上述三通道高维特征输入到Xception网络进行判决，辨识出各音频样本所对应的声场景。本发明的数据增强方法可以有效提高Xception网络分类器的泛化能力，稳定网络的训练过程。在对声场景进行辨识时，本方法可取得更优的辨识效果。

摘要（英）：

The invention discloses an acoustic scene identification method based on data enhancement. The method comprises the following steps: firstly, collecting and marking audio samples of different sound scenes; then preprocessing is carried out, and pre-emphasis, framing and windowing processing are carried out on the audio samples; data enhancement is then performed, extracting a harmonic source and an impact source of each audio sample to obtain more sufficient audio samples, extracting logarithmic Mel filter bank characteristics from the audio samples and the harmonic sources and the impact sources of the audio samples, stacking the three characteristics into a three-channel high-dimensional characteristic, and constructing more abundant training samples by adopting a hybrid enhancement technology; and finally, inputting the three-channel high-dimensional features into an Xception network for judgment, and identifying the sound scene corresponding to each audio sample. According to the data enhancement method, the generalization capability of the Xception network classifier can be effectively improved, and the training process of the network is stabilized. When the acoustic scene isidentified, the method can obtain a better identification effect.

公开/授权文献

CN109978034B 一种基于数据增强的声场景辨识方法公开/授权日：2020-12-22

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置