摘要:
An algorithm is provided which uses a model-based concept of a cluster and scores items using a score representative of the probability that a given item has been generated from the same distribution as one or more query items. The items are represented by a feature vector xi comprising a plurality of digitally represented features xij the method including: receiving an input identifying the query items; for each of the other items computing a score which is a function of a conditional probability of the feature vectors xij of the query items being generated from the generating distribution formula (I) given that the respective other item is generated from the generating distribution formula (I) and returning a score for each of the other items, a list of some or all of the other items, sorted by their respective score, or a list of n other items which have the highest score.