多人场景人声匹配方法

发明公开

请登陆查看更多内容

专利标题： 多人场景人声匹配方法
专利标题（英）： Multi-person scene speaker-voice matching method
申请号： CN201910918342.4

申请日： 2019-09-26
公开(公告)号： CN110648667A

公开(公告)日： 2020-01-03
发明人: 唐立军 , 杨家全 , 周年荣 , 张林山 , 李浩涛 , 杨洋 , 冯勇 , 严玉廷 , 李孟阳 , 罗恩博 , 梁俊宇 , 袁兴宇 , 李响 , 何婕 , 栾思平
申请人： 云南电网有限责任公司电力科学研究院
申请人地址： 云南省昆明市经济技术开发区云大西路105号
专利权人： 云南电网有限责任公司电力科学研究院
当前专利权人： 云南电网有限责任公司电力科学研究院
当前专利权人地址： 云南省昆明市经济技术开发区云大西路105号
代理机构： 北京弘权知识产权代理事务所
代理商 逯长明; 许伟群
主分类号： G10L15/26
IPC分类号： G10L15/26 ; G10L15/04 ; G10L15/24 ; G10L15/28 ; G10L21/0308 ; G06K9/00 ; G06K9/62

摘要：

本申请实施例提供了一种多人场景人声匹配方法，包括：将待匹配音频划分为多个声音片段；对声音片段进行语音识别，得到声音片段中的语音片段；获取语音片段对应的视频片段；对视频片段进行人脸检测，得到语音片段的全部预测发言人；根据视频片段中相邻灰度帧的像素差值，得到每个预测发言人在相邻灰度帧的命中信息；根据命中信息统计每个预测发言人在视频片段中的命中次数，命中次数最大的预测发言人为语音片段的目标发言人。本申请实现了将语音自动绑定到所属的目标发言人，可大大降低后续人工匹配语音和目标发言人的工作量，有利于推动视听觉认知技术的实用化。

摘要（英）：

The embodiment of the invention provides a multi-person scene speaker-voice matching method which comprises the following steps of dividing an audio to be matched into a plurality of sound fragments;performing voice recognition on the sound fragments to obtain voice fragments in the sound fragments; obtaining video fragments corresponding to the voice fragments; performing face detection on the video fragments to obtain all predicated speakers in the voice fragments; according to the pixel difference values of adjacent gray level frames in the video fragments, obtaining the hitting information of each predicated speaker in the adjacent gray level frames; and counting the hitting times of each predicated speaker in the video fragments according to the hitting information, wherein the predicated speaker with the maximum number of the hitting times is a target speaker of the voice fragments. The multi-person scene speaker-voice matching method achieves the goal of automatically binding voice to the subordinate target speaker; the workload of manually matching the voice with the target speaker in a subsequent process can be greatly reduced; and the practicability of a visual auditorycognitive technology is favorably promoted.

公开/授权文献

CN110648667B 多人场景人声匹配方法公开/授权日：2022-04-08

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/26	.语音—正文识别系统（G10L15/08优先）