-
1.
公开(公告)号:US20240362272A1
公开(公告)日:2024-10-31
申请号:US18647338
申请日:2024-04-26
申请人: Twelve Labs, Inc.
发明人: Seung Joon Lee , Raehyuk Jung , Seongyun Lee , Minjoon Seo , Jaehyuk Yi
IPC分类号: G06F16/735 , G06F40/40 , G06V20/40
CPC分类号: G06F16/735 , G06F40/40 , G06V20/46
摘要: A video analysis system receives one or more queries from users of client devices. The video analysis system trains a machine-learned video encoder and/or a decoder coupled to receive video data and a prompt including a user query and generate an output for responding to the user query. A set of video embeddings are generated by extracting frame data, audio data, or text data from the video content, and applying a machine-learned video encoder to the frame data, the audio data, or the text data to generate the set of video embeddings. The video analysis system also generates a set of prompt embeddings representing at least a portion of the query in a latent space. The video analysis system applies at least a component of a machine-learned decoder to the input tensor to generate an output including a set of output embeddings.