Abstract:
A computer-implemented method for selecting representative frames for videos is provided. The method includes receiving a video and identifying a set of features for each of the frames of the video. The features including frame-based features and semantic features. The semantic features identifying likelihoods of semantic concepts being present as content in the frames of the video. A set of video segments for the video is subsequently generated. Each video segment includes a chronological subset of frames from the video and each frame is associated with at least one of the semantic features. The method generates a score for each frame of the subset of frames for each video segment based at least on the semantic features, and selecting a representative frame for each video segment based on the scores of the frames in the video segment. The representative frame represents and summarizes the video segment.
Abstract:
A system and methodology provide for annotating videos with entities and associated probabilities of existence of the entities within video frames. A computer-implemented method identifies an entity from a plurality of entities identifying characteristics of video items. The computer-implemented method selects a set of features correlated with the entity based on a value of a feature of a plurality of features, determines a classifier for the entity using the set of features, and determines an aggregation calibration function for the entity based on the set of features. The computer-implemented method selects a video frame from a video item, where the video frame having associated features, and determines a probability of existence of the entity based on the associated features using the classifier and the aggregation calibration function.