-
公开(公告)号:US12192543B2
公开(公告)日:2025-01-07
申请号:US18393664
申请日:2023-12-21
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Ye Yu , Mei Chen , Junwen Chen
IPC: H04N21/23 , G06T7/246 , G06V20/40 , H04N21/234
Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
-
公开(公告)号:US11895343B2
公开(公告)日:2024-02-06
申请号:US17852310
申请日:2022-06-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Ye Yu , Mei Chen , Junwen Chen
IPC: H04N21/23 , H04N21/234 , G06V20/40 , G06T7/246
CPC classification number: H04N21/23418 , G06T7/246 , G06V20/46 , G06T2207/10021
Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
-