Abstract:
A computer-implemented method for selecting representative frames for videos is provided. The method includes receiving a video and identifying a set of features for each of the frames of the video. The features including frame-based features and semantic features. The semantic features identifying likelihoods of semantic concepts being present as content in the frames of the video. A set of video segments for the video is subsequently generated. Each video segment includes a chronological subset of frames from the video and each frame is associated with at least one of the semantic features. The method generates a score for each frame of the subset of frames for each video segment based at least on the semantic features, and selecting a representative frame for each video segment based on the scores of the frames in the video segment. The representative frame represents and summarizes the video segment.
Abstract:
A system and methodology provide for annotating videos with entities and associated probabilities of existence of the entities within video frames. A computer-implemented method identifies an entity from a plurality of entities identifying characteristics of video items. The computer-implemented method selects a set of features correlated with the entity based on a value of a feature of a plurality of features, determines a classifier for the entity using the set of features, and determines an aggregation calibration function for the entity based on the set of features. The computer-implemented method selects a video frame from a video item, where the video frame having associated features, and determines a probability of existence of the entity based on the associated features using the classifier and the aggregation calibration function.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting representative frames for videos. One of the methods includes receiving a search query; determining a query representation for the search query; obtaining data identifying a plurality of responsive videos for the search query, wherein each responsive video comprises a plurality of frames, wherein each frame has a respective frame representation; selecting, for each responsive video, a representative frame from the responsive video using the query representation and the frame representations for the frames in the responsive video; and generating a response to the search query, wherein the response to the search query includes a respective video search result for each of the responsive videos, and wherein the respective video search result for each of the responsive videos includes a presentation of the representative video frame from the responsive video.
Abstract:
A method includes receiving, by a processing device of a content sharing platform, a video content, selecting at least one video frame from the video content, subsampling the at least one video frame to generate a first representation of the at least one video frame, selecting a sub-region of the at least one video frame to generate a second representation of the at least one video frame, and applying a convolutional neuron network to the first and second representations of the at least one video frame to generate an annotation for the video content.
Abstract:
Facilitation of content entity annotation while maintaining joint quality, coverage and/or completeness performance conditions is provided. In one example, a system includes an aggregation component that aggregates signals indicative of initial entities for content and initial scores associated with the initial entities generated by one or more content annotation sources; and a mapping component that maps the initial scores to calibrated scores within a defined range. The system also includes a linear aggregation component that: applies selected weights to the calibrated scores, wherein the selected weights are based on joint performance conditions; and combines the weighted, calibrated scores based on a selected linear aggregation model of a plurality of linear aggregation models to generate a final score. The system also includes an annotation component that determines whether to annotate the content with one of the initial entities based on a comparison of the final score and a defined threshold value.
Abstract:
A system and methodology provide for annotating videos with entities and associated probabilities of existence of the entities within video frames. A computer-implemented method identifies an entity from a plurality of entities identifying characteristics of video items. The computer-implemented method selects a set of features correlated with the entity based on a value of a feature of a plurality of features, determines a classifier for the entity using the set of features, and determines an aggregation calibration function for the entity based on the set of features. The computer-implemented method selects a video frame from a video item, where the video frame having associated features, and determines a probability of existence of the entity based on the associated features using the classifier and the aggregation calibration function.