摘要:
An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement Image. The classification system trains a binary classifier to classify Images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.
摘要:
An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement image. The classification system trains a binary classifier to classify images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.
摘要:
An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement image. The classification system trains a binary classifier to classify images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.
摘要:
An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement image. The classification system trains a binary classifier to classify images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.
摘要:
Word correlations are estimated using a content-based method, which uses visual features of image representations of the words. The image representations of the subject words may be generated by retrieving images from data sources (such as the Internet) using image search with the subject words as query words. One aspect of the techniques is based on calculating the visual distance or visual similarity between the sets of retrieved images corresponding to each query word. The other is based on calculating the visual consistence among the set of the retrieved images corresponding to a conjunctive query word. The combination of the content-based method and a text-based method may produce even better result.
摘要:
Word correlations are estimated using a content-based method, which uses visual features of image representations of the words. The image representations of the subject words may be generated by retrieving images from data sources (such as the Internet) using image search with the subject words as query words. One aspect of the techniques is based on calculating the visual distance or visual similarity between the sets of retrieved images corresponding to each query word. The other is based on calculating the visual consistence among the set of the retrieved images corresponding to a conjunctive query word. The combination of the content-based method and a text-based method may produce even better result.
摘要:
A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.
摘要:
A duplicate image detection system generates an image table that maps hash codes of images to their corresponding images. The image table may group images according to their group identifiers generated from the most significant elements of the hash codes based on significance of the elements in representing an image. The image table thus segregates images by their group identifiers. To detect a duplicate image of a target image, the detection system generates a target hash code for the target image. The detection system then identifies the group of the target image based on the group identifier of the target hash code. After identifying the group identifier, the detection system searches the corresponding group table to identify hash codes that have values that are similar to the target hash code. The detection system then selects the images associated with those similar hash codes as being duplicates of the target image.
摘要:
A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.
摘要:
A duplicate image detection system generates an image table that maps hash codes of images to their corresponding images. The image table may group images according to their group identifiers generated from the most significant elements of the hash codes based on significance of the elements in representing an image. The image table thus segregates images by their group identifiers. To detect a duplicate image of a target image, the detection system generates a target hash code for the target image. The detection system then identifies the group of the target image based on the group identifier of the target hash code. After identifying the group identifier, the detection system searches the corresponding group table to identify hash codes that have values that are similar to the target hash code. The detection system then selects the images associated with those similar hash codes as being duplicates of the target image.