Multi-tap minimum variance distortionless response beamformer with neural networks for target speech separation

发明授权

US11423906B2 Multi-tap minimum variance distortionless response beamformer with neural networks for target speech separation 有权

请登陆查看更多内容

专利标题： Multi-tap minimum variance distortionless response beamformer with neural networks for target speech separation
申请号： US16926138

申请日： 2020-07-10
公开(公告)号： US11423906B2

公开(公告)日： 2022-08-23
发明人: Yong Xu , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu
申请人： TENCENT AMERICA LLC
申请人地址： US CA Palo Alto
专利权人： TENCENT AMERICA LLC
当前专利权人： TENCENT AMERICA LLC
当前专利权人地址： US CA Palo Alto
代理机构： Sughrue Mion, PLLC
主分类号： G10L15/25
IPC分类号： G10L15/25

Multi-tap minimum variance distortionless response beamformer with neural networks for target speech separation

摘要：

A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.

公开/授权文献

US20220013123A1 MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION 公开/授权日：2022-01-13

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/24	.利用非声学特征的语音识别
G10L15/25	..使用嘴唇位置，嘴唇运动或者脸部分析