DEEP REINFORCEMENT LEARNING FRAMEWORK FOR CHARACTERIZING VIDEO CONTENT
摘要:
Methods and systems for performing sequence level prediction of a video scene are described. Video information in a video scene is represented as a sequence of features depicted each frame. One or more scene affective labels are provided at the end of the sequence. Each label pertains to the entire sequence of frames of data. An action is taken with an agent controlled by a machine learning algorithm for a current frame of the sequence at a current time step. An output of the action represents affective label prediction for the frame at the current time step. A pool of actions taken up until the current time step including the action taken with the agent is transformed into a predicted affective history for a subsequent time step. A reward is generated on predicted actions up to the current time step by comparing the predicted actions against corresponding annotated scene affective labels.
信息查询
0/0