METHODS AND SYSTEMS FOR LEARNING LANGUAGE-INVARIANT AUDIOVISUAL REPRESENTATIONS

    公开(公告)号:US20240161500A1

    公开(公告)日:2024-05-16

    申请号:US18505081

    申请日:2023-11-08

    Applicant: Netflix, Inc.

    CPC classification number: G06V20/41 G06V10/82

    Abstract: The disclosed computer-implemented methods and systems include training a machine-learning model to accurately generate representations of similar scenes from long-form videos that have semantically different speech audio. For example, the methods and systems described herein generate machine-learning model training data including video clips and corresponding audio spectrograms. To augment this data, the methods and systems described herein further include dubbed audio spectrograms with the training data such that each video clips corresponds with a primary language audio spectrogram and a secondary language audio spectrogram. By applying a machine-learning model to this training data, the systems and methods described herein teach the machine-learning model to de-emphasize speech audio when generating audio visual representations corresponding to scenes from long-form video. Various other methods, systems, and computer-readable media are also disclosed.

Patent Agency Ranking