Method and apparatus for summarization of unsupervised video with efficient key frame selection reward functions
摘要:
Disclosed are a method and apparatus for summarization of unsupervised video with efficient key frame selection reward functions. Frame-level visual features are extracted from an input video. An attention weight is computed and an importance score is represented as a frame tracking probability for selecting a key frame using the attention weight. A temporal consistency reward function and a representativeness reward function are obtained so as to select the key frame, based on a visual similarity distance and temporal distance between key frames, and an attention-based video summarization network is trained to predict an importance score for selecting a key frame of a video summary by using the temporal consistency reward function and the representativeness reward function. A video summary is created by selecting a corresponding key frame based on the predicted importance score, the quality of the created video summary is evaluated, and policy gradient learning is performed for the attention-based video summarization network. Regularization and reconstruction loss is calculated for controlling the probability to select a key frame by using the importance score of the selected key frame. A video summary is created based on the calculated regularization and reconstruction loss.
信息查询
0/0