基于人工智能的说话人识别方法及装置、系统

发明公开

请登陆查看更多内容

专利标题： 基于人工智能的说话人识别方法及装置、系统
专利标题（英）： Speaker recognition method, device and system based on artificial intelligence
申请号： CN201910833635.2

申请日： 2019-06-17
公开(公告)号： CN110660102A

公开(公告)日： 2020-01-07
发明人: 揭泽群 , 葛政 , 刘威
申请人： 腾讯科技(深圳)有限公司
申请人地址： 广东省深圳市南山区高新区科技中一路腾讯大厦35层
专利权人： 腾讯科技(深圳)有限公司
当前专利权人： 腾讯科技(深圳)有限公司
当前专利权人地址： 广东省深圳市南山区高新区科技中一路腾讯大厦35层
代理机构： 深圳市隆天联鼎知识产权代理有限公司
代理商 刘抗美
主分类号： G06T7/73
IPC分类号： G06T7/73 ; G06K9/00 ; G06F16/29 ; G06F16/23

摘要：

本发明涉及图像处理技术领域，具体而言，涉及一种基于人工智能的说话人识别方法及装置、系统以及电子设备。所述识别方法包括：获取待检测图像并对其进行人脸识别处理以获取至少一个人脸坐标；识别待检测图像中的音频采集设备，以获取音频采集设备的第一坐标数据；根据音频采集设备的第一坐标数据与历史坐标数据计算位移数据，以根据位移数据计算音频采集设备的精确坐标；计算精确坐标与所述至少一个人脸坐标之间的物间距离，并将具有最小物间距离的人脸坐标对应的对象作为说话人。本发明的技术方案在确定待检测图像中的人脸以及唯一的音频采集设备后，可结合历史坐标数据对音频采集设备坐标的正确性进行判断并优化，提升说话人识别的精确度。

摘要（英）：

The invention relates to the technical field of image processing, in particular to a speaker recognition method, device and system based on artificial intelligence and electronic equipment. The recognition method comprises the steps of obtaining a to-be-detected image and performing face recognition processing on the to-be-detected image to obtain at least one face coordinate; identifying an audioacquisition device in the to-be-detected image to obtain first coordinate data of the audio acquisition device; calculating displacement data according to the first coordinate data and the historicalcoordinate data of the audio acquisition equipment, so as to calculate accurate coordinates of the audio acquisition equipment according to the displacement data; and calculating an inter-object distance between the accurate coordinate and the at least one face coordinate, and taking an object corresponding to the face coordinate with the minimum inter-object distance as a speaker. According to the technical scheme of the invention, after the face in the to-be-detected image and the unique audio collection device are determined, the correctness of the coordinates of the audio collection device can be judged and optimized in combination with the historical coordinate data, and the speaker recognition accuracy is improved.

公开/授权文献

CN110660102B 基于人工智能的说话人识别方法及装置、系统公开/授权日：2020-10-27

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06T	一般的图像数据处理或产生
G06T7/00	图像分析
G06T7/70	.确定物体或摄像机的姿态、方向（摄像机校准G06T7/80）
G06T7/73	..使用基于特征的方法