ACTION RECOGNITION METHOD AND APPARATUS BASED ON SPATIO-TEMPORAL SELF-ATTENTION
Abstract:
The present disclosure provides an action recognition method including: acquiring video features for input videos; generating a bounding box surrounding a person who may be a target for an action recognition; pooling the video features based on bounding box information; extracting at least one spatial feature map from pooled video features; extracting at least one temporal feature map from pooled video features; concatenating the at least one spatial feature map and the at least one temporal feature map to generate a concatenated feature map; and performing a human action recognition based on the concatenated feature map.
Information query
Patent Agency Ranking
0/0