摘要:
Methods and systems for automated annotation of persons in video content are disclosed. In one embodiment, a method of identifying faces in a video includes the stages of: generating face tracks from input video streams; selecting key face images for each face track; clustering the face tracks to generate face clusters; creating face models from the face clusters; and correlating face models with a face model database. In another embodiment, a system for identifying faces in a video includes a face model database having face entries with face models and corresponding names, and a video face identifier module. In yet another embodiment, the system for identifying faces in a video can also have a face model generator.
摘要:
A video demographics analysis system selects a training set of videos to use to correlate viewer demographics and video content data. The video demographics analysis system extracts demographic data from viewer profiles related to videos in the training set and creates a set of demographic distributions, and also extracts video data from videos in the training set. The video demographics analysis system correlates the viewer demographics with the video data of videos viewed by that viewer. Using the prediction model produced by the machine learning process, a new video about which there is no a priori knowledge can be associated with a predicted demographic distribution specifying probabilities of the video appealing to different types of people within a given demographic category, such as people of different ages within an age demographic category.
摘要:
An object recognition system performs a number of rounds of dimensionality reduction and consistency learning on visual content items such as videos and still images, resulting in a set of feature vectors that accurately predict the presence of a visual object represented by a given object name within an visual content item. The feature vectors are stored in association with the object name which they represent and with an indication of the number of rounds of dimensionality reduction and consistency learning that produced them. The feature vectors and the indication can be used for various purposes, such as quickly determining a visual content item containing a visual representation of a given object name.
摘要:
A video demographics analysis system selects a training set of videos to use to correlate viewer demographics and video content data. The video demographics analysis system extracts demographic data from viewer profiles related to videos in the training set and creates a set of demographic distributions, and also extracts video data from videos in the training set. The video demographics analysis system correlates the viewer demographics with the video data of videos viewed by that viewer. Using the prediction model produced by the machine learning process, a new video about which there is no a priori knowledge can be associated with a predicted demographic distribution specifying probabilities of the video appealing to different types of people within a given demographic category, such as people of different ages within an age demographic category.
摘要:
A classifier training system trains adapted classifiers for classifying videos based at least in part on scores produced by application of text-based classifiers to textual metadata of the videos. Each classifier corresponds to a particular category, and when applied to a given video indicates whether the video represents the corresponding category. The classifier training system applies the text-based classifiers to textual metadata of the videos to obtain the scores, and also extracts features from content of the videos, combining the scores and the content features for a video into a set of hybrid features. The adapted classifiers are then trained on the hybrid features. The adaption of the text-based classifiers from the textual domain to the video domain allows the training of accurate video classifiers (the adapted classifiers) without requiring a large training set of authoritatively labeled videos.
摘要:
A video demographics analysis system selects a training set of videos to use to correlate viewer demographics and video content data. The video demographics analysis system extracts demographic data from viewer profiles related to videos in the training set and creates a set of demographic distributions, and also extracts video data from videos in the training set. The video demographics analysis system correlates the viewer demographics with the video data of videos viewed by that viewer. Using the prediction model produced by the machine learning process, a new video about which there is no a priori knowledge can be associated with a predicted demographic distribution specifying probabilities of the video appealing to different types of people within a given demographic category, such as people of different ages within an age demographic category.
摘要:
An object recognition system performs a number of rounds of dimensionality reduction and consistency learning on visual content items such as videos and still images, resulting in a set of feature vectors that accurately predict the presence of a visual object represented by a given object name within an visual content item. The feature vectors are stored in association with the object name which they represent and with an indication of the number of rounds of dimensionality reduction and consistency learning that produced them. The feature vectors and the indication can be used for various purposes, such as quickly determining a visual content item containing a visual representation of a given object name.
摘要:
This disclosure relates to audio identification using ordinal transformations. A media matching component receives a sample audio file. The sample audio file can include, for example, a cover song. The media matching component includes a vector component that computes a set of vectors using auditory feature values included in the sample audio file. A hashing component employs a hash function to generate a fingerprint, including a set of sub-fingerprints, for the sample audio file using the set of vectors. The fingerprint is invariant to variations including but not limited to variations in key, instrumentation, encoding formats, performers, performance conditions, arrangement, and/or recording and processing variations. An identification component determines if any reference audio files are similar to the sample audio file using the fingerprint and/or sub-fingerprints, and identifies any similar reference audio files.
摘要:
Systems and methods for measuring consistency between two objects based upon a rank of object elements instead of based upon the values of those object elements. Objects being compared can be represented by d-dimension feature vectors, U and V, where each dimension includes an associated value. U and V can be converted to rank vectors, P and Q, where values of U and V dimensions are replaced by an ordered rank or a function thereof. Analysis directed to the consistency between U and V can be accomplished by determining consistency between P and Q, which can be more efficient and more accurate, particularly with regard to illumination-invariant comparisons.
摘要:
A system and method detects matches between portions of video content. A matching module receives an input video fingerprint representing an input video and a set of reference fingerprints representing reference videos in a reference database. The matching module compares the reference fingerprints and input fingerprints to generate a list of candidate segments from the reference video set. Each candidate segment comprises a time-localized portion of a reference video that potentially matches the input video. A classifier is applied to each of the candidate segments to classify the segment as a matching segment or a non-matching segment. A result is then outputted identifying a matching portion of a reference video from the reference video set based on the segments classified as matches.