- 专利标题: TEXT-CONDITIONED VIDEO REPRESENTATION
-
申请号: US17894738申请日: 2022-08-24
-
公开(公告)号: US20230351753A1公开(公告)日: 2023-11-02
- 发明人: Satya Krishna Gorti , Junwei Ma , Guangwei Yu , Maksims Volkovs , Keyvan Golestan Irani , Noël Vouitsis
- 申请人: THE TORONTO-DOMINION BANK
- 申请人地址: CA Toronto
- 专利权人: THE TORONTO-DOMINION BANK
- 当前专利权人: THE TORONTO-DOMINION BANK
- 当前专利权人地址: CA Toronto
- 主分类号: G06V20/40
- IPC分类号: G06V20/40
摘要:
A text-video recommendation model determines relevance of a text to a video in a text-video pair (e.g., as a relevance score) with a text embedding and a text-conditioned video embedding. The text-conditioned video embedding is a representation of the video used for evaluating the relevance of the video to the text, where the representation itself is a function of the text it is evaluated for. As such, the input text may be used to weigh or attend to different frames of the video in determining the text-conditioned video embedding. The representation of the video may thus differ for different input texts for comparison. The text-conditioned video embedding may be determined in various ways, such as with a set of the most-similar frames to the input text (the top-k frames) or may be based on an attention function based on query, key, and value projections.
信息查询