发明授权
US08707312B1 Document reuse in a search engine crawler 有权
搜索引擎抓取工具中的文档重用

Document reuse in a search engine crawler
摘要:
A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.
信息查询
0/0