摘要:
A “music video parser” automatically detects and segments music videos in a combined audio-video media stream. Automatic detection and segmentation is achieved by integrating shot boundary detection, video text detection and audio analysis to automatically detect temporal boundaries of each music video in the media stream. In one embodiment, song identification information, such as, for example, a song name, artist name, album name, etc., is automatically extracted from the media stream using video optical character recognition (OCR). This information is then used in alternate embodiments for cataloging, indexing and selecting particular music videos, and in maintaining statistics such as the times particular music videos were played, and the number of times each music video was played.
摘要:
Systems and methods to automatically edit a video to generate a video summary are described. In one aspect, sub-shots are extracted from the video. Importance measures are calculated for at least a portion of the extracted sub-shots. Respective relative distributions for sub-shots having relatively higher importance measures as compared to importance measures of other sub-shots are determined. Based on the determined relative distributions, sub-shots that do not exhibit a uniform distribution with respect to other sub-shots in the particular ones are dropped. The remaining sub-shots are connected with respective transitions to generate the video summary.
摘要:
Systems and methods for learning-based automatic commercial content detection are described. In one aspect, program data is divided into multiple segments. The segments are analyzed to determine visual, audio, and context-based feature sets that differentiate commercial content from non-commercial content. The context-based features are a function of single-side left and/or right neighborhoods of segments of the multiple segments.
摘要:
Systems and methods for learning-based automatic commercial content detection are described. In one aspect, the systems and methods include a training component and an analyzing component. The training component trains a commercial content classification model using a kernel support vector machine. The analyzing component analyzes program data such as video and audio data using the commercial content classification model and one or more of single-side left neighborhood(s) and right neighborhood(s) of program data segments. Based on this analysis, each of the program data segments are classified as being commercial or non-commercial segments.
摘要:
Systems and methods for learning-based automatic commercial content detection are described. In one aspect, the systems and methods include a training component and an analyzing component. The training component trains a commercial content classification model using a kernel support vector machine. The analyzing component analyzes program data such as video and audio data using the commercial content classification model and one or more of single-side left neighborhood(s) and right neighborhood(s) of program data segments. Based on this analysis, each of the program data segments are classified as being commercial or non-commercial segments.
摘要:
Systems and methods are described that implement personalized karaoke, wherein a user's personal home video and photographs are used to form a background for the lyrics during a karaoke performance. An exemplary karaoke apparatus is configured to segment visual content to produce a plurality of sub-shots and to segment music to produce a plurality of music sub-clips. Having produced the visual content sub-shots and music sub-clips, the exemplary karaoke apparatus shortens some of the plurality of sub-shots to a length of a corresponding music sub-clip from within the plurality of music sub-clips. The plurality of sub-shots is then displayed as a background to lyrics associated with the music, thereby adding interest to a karaoke performance.
摘要:
Methods and apparatuses are provided for automatically generating video data based on still image data. Certain aspects of the video may also be configured to correspond to audio features identified within associated audio data.
摘要:
A computing device configured to determine that one or more regions of an image are associated with a tag of the image is described herein. The computing device is further configured to determine one or more attribute tags describing at least one of the content or context of the one or more regions. Upon determining the attribute tags, the computing device associates the attribute tags with the tag to enable image searching based on the tag and attribute tags.
摘要:
Techniques for construction of a visual codebook are described herein. Feature points may be extracted from large numbers of images. In one example, images providing N feature points may be used to construct a codebook of K words. The centers of each of K clusters of feature points may be initialized. In a looping or iterative manner, an assignment step assigns each feature point to a cluster and an update step locates a center of each cluster. The feature points may be assigned to a cluster based on a lesser of a distance to a center of a previously assigned cluster and a distance to a center derived by operation of an approximate nearest neighbor algorithm having aspects of randomization. The loop terminates when the feature points have sufficiently converged to their respective clusters. Centers of the clusters represent visual words, which may be used to construct the visual codebook.
摘要:
A computing device configured to determine that one or more regions of an image are associated with a tag of the image is described herein. The computing device is further configured to determine one or more attribute tags describing at least one of the content or context of the one or more regions. Upon determining the attribute tags, the computing device associates the attribute tags with the tag to enable image searching based on the tag and attribute tags.