INCREMENTAL VIDEO HIGHLIGHTS DETECTION SYSTEM AND METHOD
摘要:
Systems and methods are provided that include a processor executing a video classifying program to receive an input video, sample video frames from the input video, extract frame-wise spatial features from the video frames using a convolutional neural network, extract a frame-wise temporal feature for each video frame, aggregate the frame-wise spatial features and the frame-wise temporal feature for each video frame to provide a temporal context to the frame-wise spatial features, input the aggregated frame-wise spatial features and the frame-wise temporal feature for each frame into a transformer encoder to obtain temporal-aware feature representations of the video frames, input the feature representations into a feedforward network model to obtain feedforward-transformed features, obtain a parameter by inputting each feedforward-transformed feature and a set of highlight prototypes into a function comparing the feedforward-transformed features to the set of highlight prototypes, classify the video frames as highlights based on the calculated parameter.
信息查询
0/0