Abstract:
A system and computer program product are provided for improving the utility of video recommendations in a content system via de-duplication of highly similar thumbnail images. For each video added to an online content system, a thumbnail image is generated and stored. For each such thumbnail image a compressed representation is computed. During playback of a video, a set of related videos is generated. For each video in the set, the corresponding thumbnail image and its compressed representation are retrieved. A measure of visual distance is computed for each pair in the set of representations, and measures indicating excess similarity are identified. Similarity is reduced via selective removal of some of the representations. An identification of the thumbnail images and videos corresponding to the remaining representations is produced.