-
公开(公告)号:US20200272823A1
公开(公告)日:2020-08-27
申请号:US16625172
申请日:2018-11-05
Applicant: Google LLC
Inventor: Ting Liu , Gautam Prasad , Phuc Xuan Nguyen , Bohyung Han
Abstract: Systems and methods for a weakly supervised action localization model are provided. Example models according to example aspects of the present disclosure can localize and/or classify actions in untrimmed videos using machine-learned models, such as convolutional neural networks. The example models can predict temporal intervals of human actions given video-level class labels with no requirement of temporal localization information of actions. The example models can recognize actions and identify a sparse set of keyframes associated with actions through adaptive temporal pooling of video frames, wherein the loss function of the model is composed of a classification error and a sparsity of frame selection. Following action recognition with sparse keyframe attention, temporal proposals for action can be extracted using temporal class activation mappings, and final time intervals can be estimated corresponding to target actions.
-
公开(公告)号:US20230215169A1
公开(公告)日:2023-07-06
申请号:US18181806
申请日:2023-03-10
Applicant: Google LLC
Inventor: Ting Liu , Gautam Prasad , Phuc Xuan Nguyen , Bohyung Han
IPC: G06V20/40 , G06F18/214 , G06F18/243
CPC classification number: G06V20/40 , G06F18/214 , G06F18/24317 , G06V20/44
Abstract: Systems and methods for a weakly supervised action localization model are provided. Example models according to example aspects of the present disclosure can localize and/or classify actions in untrimmed videos using machine-learned models, such as convolutional neural networks. The example models can predict temporal intervals of human actions given video-level class labels with no requirement of temporal localization information of actions. The example models can recognize actions and identify a sparse set of keyframes associated with actions through adaptive temporal pooling of video frames, wherein the loss function of the model is composed of a classification error and a sparsity of frame selection. Following action recognition with sparse keyframe attention, temporal proposals for action can be extracted using temporal class activation mappings, and final time intervals can be estimated corresponding to target actions.
-
公开(公告)号:US11881022B2
公开(公告)日:2024-01-23
申请号:US18181806
申请日:2023-03-10
Applicant: Google LLC
Inventor: Ting Liu , Gautam Prasad , Phuc Xuan Nguyen , Bohyung Han
IPC: G06V20/40 , G06F18/214 , G06F18/243
CPC classification number: G06V20/40 , G06F18/214 , G06F18/24317 , G06V20/44
Abstract: Systems and methods for a weakly supervised action localization model are provided. Example models according to example aspects of the present disclosure can localize and/or classify actions in untrimmed videos using machine-learned models, such as convolutional neural networks. The example models can predict temporal intervals of human actions given video-level class labels with no requirement of temporal localization information of actions. The example models can recognize actions and identify a sparse set of keyframes associated with actions through adaptive temporal pooling of video frames, wherein the loss function of the model is composed of a classification error and a sparsity of frame selection. Following action recognition with sparse keyframe attention, temporal proposals for action can be extracted using temporal class activation mappings, and final time intervals can be estimated corresponding to target actions.
-
公开(公告)号:US11640710B2
公开(公告)日:2023-05-02
申请号:US16625172
申请日:2018-11-05
Applicant: Google LLC
Inventor: Ting Liu , Gautam Prasad , Phuc Xuan Nguyen , Bohyung Han
Abstract: Systems and methods for a weakly supervised action localization model are provided. Example models according to example aspects of the present disclosure can localize and/or classify actions in untrimmed videos using machine-learned models, such as convolutional neural networks. The example models can predict temporal intervals of human actions given video-level class labels with no requirement of temporal localization information of actions. The example models can recognize actions and identify a sparse set of keyframes associated with actions through adaptive temporal pooling of video frames, wherein the loss function of the model is composed of a classification error and a sparsity of frame selection. Following action recognition with sparse keyframe attention, temporal proposals for action can be extracted using temporal class activation mappings, and final time intervals can be estimated corresponding to target actions.
-
-
-