-
公开(公告)号:US20240104915A1
公开(公告)日:2024-03-28
申请号:US18459824
申请日:2023-09-01
Applicant: Intel Corporation
Inventor: Anthony Daniel Rhodes , Byungsu Min , Subarna Tripathi , Giuseppe Raffa , Sovan Biswas
CPC classification number: G06V10/82 , G06V10/751 , G06V10/86 , G06V20/46 , G06V20/49
Abstract: Machine learning models can process a video and generate outputs such as action segmentation assigning portions of the video to a particular action, or action classification assigning an action class for each frame of the video. Some machine learning models can accurately make predictions for short videos but may not be particularly suited for performing action segmentation for long duration, structured videos. An effective machine learning model may include a hybrid architecture involving a temporal convolutional network and a bi-directional graph neural network. The machine learning model can process long duration structured videos by using a temporal convolutional network as a first pass action segmentation model to generate rich, frame-wise features. The frame-wise features can be converted into a graph having forward edges and backward edges. A graph neural network can process the graph to refine a final fine-grain per-frame action prediction.