SYSTEM AND METHOD FOR ENHANCING MACHINE LEARNING MODEL FOR AUDIO/VIDEO UNDERSTANDING USING GATED MULTI-LEVEL ATTENTION AND TEMPORAL ADVERSARIAL TRAINING

    公开(公告)号:US20220300740A1

    公开(公告)日:2022-09-22

    申请号:US17387889

    申请日:2021-07-28

    Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

    System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training

    公开(公告)号:US11989939B2

    公开(公告)日:2024-05-21

    申请号:US17387889

    申请日:2021-07-28

    CPC classification number: G06V20/41 G06F18/214

    Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

Patent Agency Ranking