TEXT-CONDITIONED VIDEO REPRESENTATION

发明公开

US20230351753A1 TEXT-CONDITIONED VIDEO REPRESENTATION 审中-公开

请登陆查看更多内容

专利标题： TEXT-CONDITIONED VIDEO REPRESENTATION
申请号： US17894738

申请日： 2022-08-24
公开(公告)号： US20230351753A1

公开(公告)日： 2023-11-02
发明人: Satya Krishna Gorti , Junwei Ma , Guangwei Yu , Maksims Volkovs , Keyvan Golestan Irani , Noël Vouitsis
申请人： THE TORONTO-DOMINION BANK
申请人地址： CA Toronto
专利权人： THE TORONTO-DOMINION BANK
当前专利权人： THE TORONTO-DOMINION BANK
当前专利权人地址： CA Toronto
主分类号： G06V20/40
IPC分类号： G06V20/40

摘要：

A text-video recommendation model determines relevance of a text to a video in a text-video pair (e.g., as a relevance score) with a text embedding and a text-conditioned video embedding. The text-conditioned video embedding is a representation of the video used for evaluating the relevance of the video to the text, where the representation itself is a function of the text it is evaluated for. As such, the input text may be used to weigh or attend to different frames of the video in determining the text-conditioned video embedding. The representation of the video may thus differ for different input texts for comparison. The text-conditioned video embedding may be determined in various ways, such as with a set of the most-similar frames to the input text (the top-k frames) or may be based on an attention function based on query, key, and value projections.

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V20/00	场景；特定场景元素（控制数码相机 H04N5/232）
G06V20/40	.在视频内容中（提取叠加文本 G06V20/62）（视频检索 G06F16/70）（在视频服务器中处理视频基本流H04N21/234）（在视频客户端中处理视频基本流H04N21/44）