Patent search ap:("Google LLC") AND inv:"Xingyi Zhou" Page 1

1.

发明申请
Dense Video Object Captioning from Disjoint Vision 有权

公开(公告)号：US20250053753A1

公开(公告)日：2025-02-13

申请号：US18448508

申请日：2023-08-11

Applicant: Google LLC

Inventor： Xingyi Zhou , Anurag Arnab , Chen Sun , Cordelia Luise Schmid

IPC: G06F40/40 , G06T7/246 , G06V10/22 , G06V10/774 , G06V10/776 , G06V20/40

Abstract: Provided are a new task and model for dense video object captioning—detecting, tracking, and captioning trajectories of all objects in a video. This task unifies spatial and temporal understanding of the video, and requires fine-grained language description. Example implementations of the proposed model for dense video object captioning can be trained end-to-end and can include different models for spatial localization, tracking, and captioning. As such, some example implementations of the present disclosure can train the proposed model with a mixture of disjoint tasks, and leverage diverse, large-scale datasets which supervise different parts of an example proposed model. This results in noteworthy zero-shot performance.

Patent Agency Ranking