MULTI-MODAL FRAMEWORK FOR MULTI-CHANNEL TARGET SPEECH SEPERATION
摘要:
A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.
信息查询
0/0