VIDEO-TEXT MODELING WITH ZERO-SHOT TRANSFER FROM CONTRASTIVE CAPTIONERS

    公开(公告)号:US20250124708A1

    公开(公告)日:2025-04-17

    申请号:US18694604

    申请日:2023-12-08

    Applicant: Google LLC

    Abstract: Provided is an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. Some example implementations include a model which can be referred to as VideoCoCa. Example implementations reuse a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with little or minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, aspects of the present disclosure leverage findings that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to “flattened frame embeddings”, yielding a strong zero-shot transfer baseline for many video-text tasks.

    MEDIA TREND IDENTIFICATION IN SHORT-FORM VIDEO PLATFORMS

    公开(公告)号:US20250118060A1

    公开(公告)日:2025-04-10

    申请号:US18900473

    申请日:2024-09-27

    Applicant: Google LLC

    Abstract: Methods and systems for media trend identification of content sharing platforms are provided herein. A set of audiovisual embeddings that represent audiovisual features of a media item is obtained. A set of textual embeddings that represent textual features of the media item is obtained. The obtained set of audiovisual embeddings and the obtained set of textual embeddings are provided as an input to an artificial intelligence (AI) model trained to predict whether a respective media item is associated with one or more media trends of a platform based on given embeddings for the media item. One or more outputs of the AI model are obtained. A determination is made, based on the one or more outputs of the AI model, whether the media item is associated with the one or more media trends of the platform.

Patent Agency Ranking