Patent search ap:("Netflix Page Inc.") AND inv:"Mahdi Kalayeh"

1.

发明公开
METHODS AND SYSTEMS FOR LEARNING LANGUAGE-INVARIANT AUDIOVISUAL REPRESENTATIONS 审中-公开

公开(公告)号：US20240161500A1

公开(公告)日：2024-05-16

申请号：US18505081

申请日：2023-11-08

Applicant: Netflix, Inc.

Inventor： Nihkil Singh , Iroro Orife , Chih-Wei Wu , Mahdi Kalayeh

IPC: G06V20/40 , G06V10/82

CPC classification number: G06V20/41 , G06V10/82

Abstract: The disclosed computer-implemented methods and systems include training a machine-learning model to accurately generate representations of similar scenes from long-form videos that have semantically different speech audio. For example, the methods and systems described herein generate machine-learning model training data including video clips and corresponding audio spectrograms. To augment this data, the methods and systems described herein further include dubbed audio spectrograms with the training data such that each video clips corresponds with a primary language audio spectrogram and a secondary language audio spectrogram. By applying a machine-learning model to this training data, the systems and methods described herein teach the machine-learning model to de-emphasize speech audio when generating audio visual representations corresponding to scenes from long-form video. Various other methods, systems, and computer-readable media are also disclosed.

Patent Agency Ranking