- 专利标题: MULTI-MODAL FRAMEWORK FOR MULTI-CHANNEL TARGET SPEECH SEPERATION
-
申请号: US16901487申请日: 2020-06-15
-
公开(公告)号: US20210390970A1公开(公告)日: 2021-12-16
- 发明人: Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu
- 申请人: TENCENT AMERICA LLC
- 申请人地址: US CA Palo Alto
- 专利权人: TENCENT AMERICA LLC
- 当前专利权人: TENCENT AMERICA LLC
- 当前专利权人地址: US CA Palo Alto
- 主分类号: G10L21/0272
- IPC分类号: G10L21/0272 ; G10L17/00 ; G06K9/00 ; G06T11/60 ; G06T7/00 ; G06T7/20 ; G06N3/02
摘要:
A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.
公开/授权文献
信息查询