Audio-visual separation of on-screen sounds based on machine learning models

Invention Grant

US11756570B2 Audio-visual separation of on-screen sounds based on machine learning models 有权

Please log in to see more content

Patent Title: Audio-visual separation of on-screen sounds based on machine learning models
Application No.: US17214186

Application Date: 2021-03-26
Publication No.: US11756570B2

Publication Date: 2023-09-12
Inventor: Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R Hershey
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: McDonnell Boehnen Hulbert & Berghoff LLP
Main IPC: G10L25/57
IPC: G10L25/57 ; G06N3/088 ; G10L25/30 ; G06V20/40 ; G06F18/214

Audio-visual separation of on-screen sounds based on machine learning models

Abstract:

Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Public/Granted literature

US20220310113A1 Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models Public/Granted day:2022-09-29

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L25/00	不限于组G10L 15/00-G10L 21/00的语言或者声音分析技术(当利用语音检测器来感知一些信号特殊特征的基于半导体的静噪放大器，如无信号时的感知入H03G3/34)
G10L25/48	.专门适用于特定用途
G10L25/51	..比较或判别
G10L25/57	...用于处理视频信号