Patent search ap:("Google LLC") AND inv:"Scott Wisdom" Page 1

1.

发明公开
Audio-Visual Separation of On-Screen Sounds based on Machine Learning Models 审中-公开

公开(公告)号：US20230386502A1

公开(公告)日：2023-11-30

申请号：US18226545

申请日：2023-07-26

Applicant: Google LLC

Inventor： Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R. Hershey

IPC: G10L25/57 , G06N3/088 , G10L25/30 , G06V20/40 , G06F18/214

CPC classification number: G10L25/57 , G06N3/088 , G10L25/30 , G06V20/40 , G06F18/214

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

2.

发明申请
Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models 有权

公开(公告)号：US20220310113A1

公开(公告)日：2022-09-29

申请号：US17214186

申请日：2021-03-26

Applicant: Google LLC

Inventor： Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R. Hershey

IPC: G10L25/57 , G06K9/00 , G06K9/62 , G10L25/30 , G06N3/08

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

3.

发明授权
Audio-visual separation of on-screen sounds based on machine learning models 有权

公开(公告)号：US12217768B2

公开(公告)日：2025-02-04

申请号：US18226545

申请日：2023-07-26

Applicant: Google LLC

Inventor： Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R. Hershey

IPC: G10L25/57 , G06F18/214 , G06N3/088 , G06V20/40 , G10L25/30

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

4.

发明授权
Audio-visual separation of on-screen sounds based on machine learning models 有权

公开(公告)号：US11756570B2

公开(公告)日：2023-09-12

申请号：US17214186

申请日：2021-03-26

Applicant: Google LLC

Inventor： Efthymios Tzinis , Scott Wisdom , Aren Jansen , John R Hershey

IPC: G10L25/57 , G06N3/088 , G10L25/30 , G06V20/40 , G06F18/214

CPC classification number: G10L25/57 , G06F18/214 , G06N3/088 , G06V20/40 , G10L25/30

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Patent Agency Ranking