Patent search ap:("Google LLC") AND inv:"Tao Zhu" Page 1

1.

发明申请
VIDEO-TEXT MODELING WITH ZERO-SHOT TRANSFER FROM CONTRASTIVE CAPTIONERS 有权

公开(公告)号：US20250124708A1

公开(公告)日：2025-04-17

申请号：US18694604

申请日：2023-12-08

Applicant: Google LLC

Inventor： Shen Yan , Tao Zhu , Zirui Wang , Yuan Cao , Jiahui Yu

IPC: G06V20/40 , G06F16/583

Abstract: Provided is an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. Some example implementations include a model which can be referred to as VideoCoCa. Example implementations reuse a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with little or minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, aspects of the present disclosure leverage findings that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to “flattened frame embeddings”, yielding a strong zero-shot transfer baseline for many video-text tasks.

2.

发明申请
MEDIA ITEM CHARACTERIZATION BASED ON MULTIMODAL EMBEDDINGS 有权

公开(公告)号：US20250111671A1

公开(公告)日：2025-04-03

申请号：US18900457

申请日：2024-09-27

Applicant: Google LLC

Inventor： Tao Zhu , Jiahui Yu , Jingchen Feng , Kai Chen , Pooya Abolghasemi , Gagan Bansal , Jieren Xu , Hui Miao , Yaping Zhang , Shuchao Bi , Yonghui Wu , Claire Cui , Rohan Anil

IPC: G06V20/40 , G06F40/284 , G10L25/57

Abstract: Methods and systems for media item characterization based on multimodal embeddings are provided herein. A media item including a sequence of video frames is identified. A set of video embeddings representing visual features of the sequence of video frames is obtained. A set of audio embeddings representing audio features of the sequence of video frames is obtained. A set of audiovisual embeddings is generated based on the set of video embeddings and the set of audio embeddings. Each of the set of audiovisual embeddings represents a visual feature and an audio feature of a respective video frame of the sequence of video frames. One or more media characteristics associated with the media item are determined based on the set of audiovisual embeddings.

3.

发明申请
MEDIA TREND IDENTIFICATION IN SHORT-FORM VIDEO PLATFORMS 有权

公开(公告)号：US20250118060A1

公开(公告)日：2025-04-10

申请号：US18900473

申请日：2024-09-27

Applicant: Google LLC

Inventor： Mingyan Gao , Tao Zhu , Hui Miao , Ye Jin , Bibang Liu , Qiao Zhang , Jeffrey Daniel Forrester

IPC: G06V10/80 , G06F40/284 , G06V10/82 , G06V20/40 , G10L15/02 , G10L25/18 , G10L25/57

Abstract: Methods and systems for media trend identification of content sharing platforms are provided herein. A set of audiovisual embeddings that represent audiovisual features of a media item is obtained. A set of textual embeddings that represent textual features of the media item is obtained. The obtained set of audiovisual embeddings and the obtained set of textual embeddings are provided as an input to an artificial intelligence (AI) model trained to predict whether a respective media item is associated with one or more media trends of a platform based on given embeddings for the media item. One or more outputs of the AI model are obtained. A determination is made, based on the one or more outputs of the AI model, whether the media item is associated with the one or more media trends of the platform.

Patent Agency Ranking