Abstract:
Implementations disclose predicting video start times for maximizing user engagement. A method includes receiving a first content item comprising content item segments, processing the first content item using a trained machine learning model that is trained based on interaction signals and audio-visual content features of a training set of training segments of training content items, and obtaining, based on the processing of the first content item using the trained machine learning model, one or more outputs comprising salience scores for the content item segments, the salience scores indicating which content item segment of the content item segments is to be selected as a starting point for playback of the first content item.
Abstract:
Implementations disclose predicting video start times for maximizing user engagement. A method includes applying a machine-learned model to audio-visual content features of segments of a target content item, the machine-learned model trained based on user interaction signals and audio-visual content features of a training set of content item segments, calculating, based on applying the machine-learned model, a salience score for each of the segments of the target content item, and selecting, based on the calculated salience scores, one of the segments of the target content item as a starting point for playback of the target content item.
Abstract:
A solution is provided for temporally segmenting a video based on analysis of entities identified in the video frames of the video. The video is decoded into multiple video frames and multiple video frames are selected for annotation. The annotation process identifies entities present in a sample video frame and each identified entity has a timestamp and confidence score indicating the likelihood that the entity is accurately identified. For each identified entity, a time series comprising of timestamps and corresponding confidence scores is generated and smoothed to reduce annotation noise. One or more segments containing an entity over the length of the video are obtained by detecting boundaries of the segments in the time series of the entity. From the individual temporal segmentation for each identified entity in the video, an overall temporal segmentation for the video is generated, where the overall temporal segmentation reflects the semantics of the video.
Abstract:
A computer-implemented method for selecting representative frames for videos is provided. The method includes receiving a video and identifying a set of features for each of the frames of the video. The features including frame-based features and semantic features. The semantic features identifying likelihoods of semantic concepts being present as content in the frames of the video. A set of video segments for the video is subsequently generated. Each video segment includes a chronological subset of frames from the video and each frame is associated with at least one of the semantic features. The method generates a score for each frame of the subset of frames for each video segment based at least on the semantic features, and selecting a representative frame for each video segment based on the scores of the frames in the video segment. The representative frame represents and summarizes the video segment.
Abstract:
A system and methodology provide for annotating videos with entities and associated probabilities of existence of the entities within video frames. A computer-implemented method identifies an entity from a plurality of entities identifying characteristics of video items. The computer-implemented method selects a set of features correlated with the entity based on a value of a feature of a plurality of features, determines a classifier for the entity using the set of features, and determines an aggregation calibration function for the entity based on the set of features. The computer-implemented method selects a video frame from a video item, where the video frame having associated features, and determines a probability of existence of the entity based on the associated features using the classifier and the aggregation calibration function.
Abstract:
A system and method for identifying and predicting subjective attributes for entities (e.g., media clips, movies, television shows, images, newspaper articles, blog entries, persons, organizations, commercial businesses, etc.) are disclosed. In one aspect, subjective attributes for a first media item are identified based on a reaction to the first media item, and relevancy scores for the subjective attributes with respect to the first media item are determined. A classifier is trained using (i) a training input comprising a set of features for the first media item, and a target output for the training input, the target output comprising the respective relevancy scores for the subjective attributes with respect to the first media item.
Abstract:
A solution is provided for temporally segmenting a video based on analysis of entities identified in the video frames of the video. The video is decoded into multiple video frames and multiple video frames are selected for annotation. The annotation process identifies entities present in a sample video frame and each identified entity has a timestamp and confidence score indicating the likelihood that the entity is accurately identified. For each identified entity, a time series comprising of timestamps and corresponding confidence scores is generated and smoothed to reduce annotation noise. One or more segments containing an entity over the length of the video are obtained by detecting boundaries of the segments in the time series of the entity. From the individual temporal segmentation for each identified entity in the video, an overall temporal segmentation for the video is generated, where the overall temporal segmentation reflects the semantics of the video.
Abstract:
A system and methodology provide for annotating videos with entities and associated probabilities of existence of the entities within video frames. A computer-implemented method selects an entity from a plurality of entities identifying characteristics of a video item, where the video item has associated metadata. The computer-implemented method receives probabilities of existence of the entity in video frames of the video item, and selects a video frame determined to comprise the entity responsive to determining the video frame having a probability of existence of the entity greater than zero. The computer-implemented method determines a scaling factor for the probability of existence of the entity using the metadata of the video item, and determines an adjusted probability of existence of the entity by using the scaling factor to adjust the probability of existence of the entity. The computer-implemented method labels the video frame with the adjusted probability of existence.
Abstract:
Methods, systems, and media for presenting comments based on correlation with content are provided. In some implementations, a method for presenting ranked comments is provided, the method comprising: receiving, using a hardware processor, content data related to an item of content; receiving, using the hardware processor, comment data related to a comment associated with the item of content; determining, using the hardware processor, a degree of correlation between at least a portion of the comment data and one or more portions of the content data; determining, using the hardware processor, a priority for the comment based on the degree of correlation; and presenting, using the hardware processor, the comment based on the priority.
Abstract:
Techniques identify time-sensitive content and present the time-sensitive content to communication devices of users interested or potentially interested in the time-sensitive content. A content management component analyzes video or audio content, and extracts information from the content and determines whether the content is time-sensitive content, such as recent news-related content, based on analysis of the content and extracted information. The content management component evaluates user-related information and the extracted information, and determines whether a user(s) is likely to be interested in the time-sensitive content based on the evaluation results. The content management component sends a notification to the communication device(s) of the user(s) in response to determining the user(s) is likely to be interested in the time-sensitive content.