Method and apparatus for indexing document content and content comparison with World Wide Web search service
    1.
    发明授权
    Method and apparatus for indexing document content and content comparison with World Wide Web search service 有权
    使用万维网搜索服务索引文档内容和内容比较的方法和装置

    公开(公告)号:US06757675B2

    公开(公告)日:2004-06-29

    申请号:US10365839

    申请日:2003-02-12

    IPC分类号: G06F1730

    摘要: Methods and related systems for indexing the contents of documents for comparison with the contents of other documents to identify matching content. A method for comparing the contents of a query document to the content on the World Wide Web is set forth. The contents of a query document are indexed and compared to content from the World Wide Web which is continuously retrieved and indexed. The method for indexing may comprise selecting substrings from the document, hashing the substrings to generate a plurality of hash values having a known range of values, selecting certain hash values to save from the generated hash values, and sorting the saved hash values. Methods for selecting certain hash values to save are set forth.

    摘要翻译: 用于索引文件内容的方法和相关系统与其他文件的内容进行比较,以识别匹配内容。 阐述了将查询文档的内容与万维网上的内容进行比较的方法。 查询文档的内容被索引并与来自万维网的内容进行比较,该内容被连续检索和索引。 用于索引的方法可以包括从文档中选择子串,将子串散列以产生具有已知值范围的多个哈希值,从所生成的散列值中选择某些哈希值以保存,以及对所保存的哈希值进行排序。 阐述了选择某些哈希值进行保存的方法。