摘要:
This disclosure describes various exemplary method and computer program products for transductive multi-label classification in detecting video concepts for information retrieval. This disclosure describes utilizing a hidden Markov random field formulation to detect labels for concepts in a video content and modeling a multi-label interdependence between the labels by a pairwise Markov random field. The process groups the labels into several parts to speed up a labeling inference and calculates a conditional probability score for the labels, the conditional probability scores are ordered for ranking in a video retrieval evaluation.
摘要:
Described is perceptually near-lossless video summarization for use in maintaining video summaries, which operates to substantially reconstruct an original video in a generally perceptually near-lossless manner. A video stream is summarized with little information loss by using a relatively very small piece of summary metadata. The summary metadata comprises an image set of synthesized mosaics and representative keyframes, audio data, and the metadata about video structure and motion. In one implementation, the metadata is computed and maintained (e.g., as a file) to summarize a relatively large video sequence, by segmenting a video shot into subshots, and selecting keyframes and mosaics based upon motion data corresponding to those subshots. The motion data is maintained as a semantic description associated with the image set. To reconstruct the video, the metadata is processed, including simulating motion using the image set and the semantic description, which recovers the audiovisual content without any significant information loss.
摘要:
Many internet users consume content through online videos. For example, users may view movies, television shows, music videos, and/or homemade videos. It may be advantageous to provide additional information to users consuming the online videos. Unfortunately, many current techniques may be unable to provide additional information relevant to the online videos from outside sources. Accordingly, one or more systems and/or techniques for determining a set of additional information relevant to an online video are disclosed herein. In particular, visual, textual, audio, and/or other features may be extracted from an online video (e.g., original content of the online video and/or embedded advertisements). Using the extracted features, additional information (e.g., images, advertisements, etc.) may be determined based upon matching the extracted features with content of a database. The additional information may be presented to a user consuming the online video.
摘要:
Techniques for image search using contextual information related to a user query are described. A user query including at least one of textual data or image data from a collection of data displayed by a computing device is received from a user. At least one other subset of data selected from the collection of data is received as contextual information that is related to and different from the user query. Data files such as image files are retrieved and ranked based on the user query to provide a pre-ranked set of data files. The pre-ranked data files are then ranked based on the contextual information to provide a re-ranked set of data files to be displayed to the user.
摘要:
Systems and methods are described for creating a video booklet that allows browsing and search of a video library. In one implementation, each video in the video library is divided into segments. Each segment is represented by a thumbnail image. Signatures of the representative thumbnails are extracted and stored in a database. The thumbnail images are then printed into an artistic paper booklet. A user can photograph one of the thumbnails in the paper booklet to automatically play the video segment corresponding to the thumbnail. Active shape modeling is used to identify and restore the photo information to the form of a thumbnail image from which a signature can be extracted for comparison with the database.
摘要:
Described is a technology by which an image is classified (e.g., grouped and/or labeled), based on multi-label multi-instance data learning-based classification according to semantic labels and regions. An image is processed in an integrated framework into multi-label multi-instance data, including region and image labels. The framework determines local association data based on each region of an image. Other multi-label multi-instance data is based on relationships between region labels of the image, relationships between image labels of the image, and relationships between the region and image labels. These data are combined to classify the image. Training is also described.
摘要:
Visual concepts contained within a video clip are classified based upon a set of target concepts. The clip is segmented into shots and a multi-layer multi-instance (MLMI) structured metadata representation of each shot is constructed. A set of pre-generated trained models of the target concepts is validated using a set of training shots. An MLMI kernel is recursively generated which models the MLMI structured metadata representation of each shot by comparing prescribed pairs of shots. The MLMI kernel is subsequently utilized to generate a learned objective decision function which learns a classifier for determining if a particular shot (that is not in the set of training shots) contains instances of the target concepts. A regularization framework can also be utilized in conjunction with the MLMI kernel to generate modified learned objective decision functions. The regularization framework introduces explicit constraints which serve to maximize the precision of the classifier.
摘要:
Exemplary media browsing, searching and authoring tools allow for media interaction over a web. An exemplary method includes acquiring digital video data, coding the digital video data using scalable video coding to generate scalable coded digital video data, analyzing the scalable coded digital video data using one or more video filters to generate information pertaining to the scalable coded digital video data and providing web access to the information. Various other exemplary technologies are disclosed.
摘要:
Described is perceptually near-lossless video summarization for use in maintaining video summaries, which operates to substantially reconstruct an original video in a generally perceptually near-lossless manner. A video stream is summarized with little information loss by using a relatively very small piece of summary metadata. The summary metadata comprises an image set of synthesized mosaics and representative keyframes, audio data, and the metadata about video structure and motion. In one implementation, the metadata is computed and maintained (e.g., as a file) to summarize a relatively large video sequence, by segmenting a video shot into subshots, and selecting keyframes and mosaics based upon motion data corresponding to those subshots. The motion data is maintained as a semantic description associated with the image set. To reconstruct the video, the metadata is processed, including simulating motion using the image set and the semantic description, which recovers the audiovisual content without any significant information loss.
摘要:
Described herein is technology for, among other things, selecting a representative thumbnail from a video clip. The technology involves analyzing frames of the video clip to determine which frames are stable, the result of the analysis being a number of segments of stable frames. From the stable segments, a number of candidate segments are selected, where candidate segments are those segments determined to a degree of certainty to be program content. The representative thumbnail is then selected from among the frames of the candidate segments.