System and method for storing and querying document collections
Abstract:
A system for storing document collections in a manner that facilitates efficient querying. Each document vector is hashed, by applying a suitable hash function to the components of the vector. The hash function maps the vector to a particular hash value, corresponding to a particular hyperbox in the multidimensional space to which the vectors belong. The vector, or a pointer to the vector, is then stored in a hash table in association with the vector's hash value. Subsequently, given a document of interest, documents similar to the document of interest may be found by hashing the vector of the document of interest, and then returning the vectors that are associated, in the hash table, with the resulting hash value.
Information query
Patent Agency Ranking
0/0