AUDIO-VISUAL SPEECH SEPARATION

发明申请

US20200335121A1 AUDIO-VISUAL SPEECH SEPARATION 审中-公开

请登陆查看更多内容

专利标题： AUDIO-VISUAL SPEECH SEPARATION
申请号： US16761707

申请日： 2018-11-21
公开(公告)号： US20200335121A1

公开(公告)日： 2020-10-22
发明人: Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim
申请人： GOOGLE LLC
国际申请： PCT/US2018/062330 WO 20181121
主分类号： G10L21/10
IPC分类号： G10L21/10 ; G10L21/18 ; G10L15/16 ; G06K9/00 ; G06K9/62

摘要：

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

公开/授权文献

US11456005B2 Audio-visual speech separation 公开/授权日：2022-09-27

信息查询

Global Dossier

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/06	.将语音转换成非可听表达形式，例如语音可视化、触觉辅助的语音处理（G10L15/26优先）
G10L21/10	..转换成可视信息