-
公开(公告)号:US20250139971A1
公开(公告)日:2025-05-01
申请号:US18410363
申请日:2024-01-11
Applicant: Samsung Electronics Co., Ltd.
Inventor: Saket Gurukar , Du Tran
IPC: G06V20/40
Abstract: A method includes obtaining a video and a relational space-time query and identifying at least one type of the relational space-time query. The at least one identified type of the relational space-time query represents at least one of: an activity type, an object type, or a time type. The method also includes learning correlations among activities, objects, and time in the video using, one or more cross-attention models. The method further includes obtaining one or more predictions generated using one or more outputs of the one or more cross-attention models based on the at least one identified type of the relational space-time query. In addition, the method includes generating a response to the relational space-time query based on the one or more predictions.