摘要:
Visual concepts contained within a video clip are classified based upon a set of target concepts. The clip is segmented into shots and a multi-layer multi-instance (MLMI) structured metadata representation of each shot is constructed. A set of pre-generated trained models of the target concepts is validated using a set of training shots. An MLMI kernel is recursively generated which models the MLMI structured metadata representation of each shot by comparing prescribed pairs of shots. The MLMI kernel is subsequently utilized to generate a learned objective decision function which learns a classifier for determining if a particular shot (that is not in the set of training shots) contains instances of the target concepts. A regularization framework can also be utilized in conjunction with the MLMI kernel to generate modified learned objective decision functions. The regularization framework introduces explicit constraints which serve to maximize the precision of the classifier.
摘要:
Visual concepts contained within a video clip are classified based upon a set of target concepts. The clip is segmented into shots and a multi-layer multi-instance (MLMI) structured metadata representation of each shot is constructed. A set of pre-generated trained models of the target concepts is validated using a set of training shots. An MLMI kernel is recursively generated which models the MLMI structured metadata representation of each shot by comparing prescribed pairs of shots. The MLMI kernel is subsequently utilized to generate a learned objective decision function which learns a classifier for determining if a particular shot (that is not in the set of training shots) contains instances of the target concepts. A regularization framework can also be utilized in conjunction with the MLMI kernel to generate modified learned objective decision functions. The regularization framework introduces explicit constraints which serve to maximize the precision of the classifier.
摘要:
Techniques for recommending music and advertising to enhance a user's experience while photo browsing are described. In some instances, songs and ads are ranked for relevance to at least one photo from a photo album. The songs, ads and photo(s) from the photo album are then mapped to a style and mood ontology to obtain vector-based representations. The vector-based representations can include real valued terms, each term associated with a human condition defined by the ontology. A re-ranking process generates a relevancy term for each song and each ad indicating relevancy to the photo album. The relevancy terms can be calculated by summing weighted terms from the ranking and the mapping. Recommended music and ads may then be provided to a user, as the user browses a series of photos obtained from the photo album. The ads may be seamlessly embedded into the music in a nonintrusive manner.
摘要:
Video advertising overlay technique embodiments are presented that generally detect a set of spatio-temporal nonintrusive positions within a series of consecutive video frames in shots of a digital video and then overlay contextually relevant ads on these positions. In one general embodiment, this is accomplished by decomposing the video into a series of shots, and then identifying a video advertisement for each of a selected set of the shots. The identified video advertisement is one that is determined to be the most relevant to the content of the shot. An overlay area is also identified in each of the shots, where the selected overlay area is the least intrusive among a plurality of prescribed areas to a viewer of the video. The video advertisements identified for the shots are then respectively scheduled to be overlaid in the identified overlay area of a shot, whenever the shot is played.
摘要:
Techniques for recommending music and advertising to enhance a user's experience while photo browsing are described. In some instances, songs and ads are ranked for relevance to at least one photo from a photo album. The songs, ads and photo(s) from the photo album are then mapped to a style and mood ontology to obtain vector-based representations. The vector-based representations can include real valued terms, each term associated with a human condition defined by the ontology. A re-ranking process generates a relevancy term for each song and each ad indicating relevancy to the photo album. The relevancy terms can be calculated by summing weighted terms from the ranking and the mapping. Recommended music and ads may then be provided to a user, as the user browses a series of photos obtained from the photo album. The ads may be seamlessly embedded into the music in a nonintrusive manner.
摘要:
A method, a computer-readable storage media, and a user interface describe techniques for creating a video collage synthesized from video content, selecting representative images from the video content, extracting and resizing regions of interest (ROI) from the representative images from the video content, and arranging the regions of interest on a canvas without seams while preserving a temporal structure of the video content. The described method, computer-readable storage, and user interface enhance the experience of the user in browsing a video collage that is compact.
摘要:
The sponsored multi-media blogging technique is an advertising-driven service on a computing device, such as a mobile phone, that makes the multi-media micro-blog or blog an effective carrier for advertising. The data collected while employing the sponsored multi-media blogging technique is used for user intent mining and increasing advertisement relevance for mobile advertising projects. The benefits to the sponsored multi-media blogging technique's users are a natural interface for composing multi-media micro-blogs/blogs and instant experience sharing, while the benefits to advertisers is the promoted brand impression from the contextual advertising in rich media micro-blogs/blogs.
摘要:
Techniques for image selection and region of interest analysis are described herein. A pair of two or more users is configured, and an image is displayed to the pair. The image can be a still image (i.e., a picture) or a moving image (i.e., video). In some instances, a plurality of advertisements is suggested for possible association with the image. Input is received from both users in the pair, indicating a positive or a negative association between each advertisement and the image. When the pair positively rates an advertisement, the advertisement is associated with the image. A plurality of regions of interest within the image may be suggested. In response, positive or negative input is received from the pair indicating whether each of the plurality of regions of interest is appropriately suggested for placement of an advertisement.
摘要:
Systems and methods are described for detecting capture-intention in order to analyze video content. In one implementation, a system decomposes video structure into sub-shots, extracts intention-oriented features from the sub-shots, delineates intention units via the extracted features, and classifies the intention units into intention categories via the extracted features. A video library can be organized via the categorized intention units.
摘要:
Embodiments that provide multi-video synthesis are disclosed. In accordance with one embodiment, multi-video synthesis includes breaking a main video into a plurality of main frames and break a supplementary video into a plurality of supplementary frames. The multi-video synthesis also includes assigning one or more supplementary frames into each of a plurality of states of a Hidden Markov Model (HMM), where each of the plurality of states corresponding to one or more main frames. The multi-video synthesis further includes determining optimal frames in the plurality of main frames for insertion of the plurality of supplementary frames based on the plurality of states and visual properties. The optimal frames include optimal insertion positions. The multi-video synthesis additionally includes inserting the plurality of supplementary frames into the optimal insertion positions to form a synthesized video.