发明授权
- 专利标题: Document reuse in a search engine crawler
- 专利标题(中): 搜索引擎抓取工具中的文档重用
-
申请号: US10882955申请日: 2004-06-30
-
公开(公告)号: US08707312B1公开(公告)日: 2014-04-22
- 发明人: Huican Zhu , Maximilian Ibel , Anurag Acharya , Howard Bradley Gobioff
- 申请人: Huican Zhu , Maximilian Ibel , Anurag Acharya , Howard Bradley Gobioff
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Morgan, Lewis & Bockius LLP
- 主分类号: G06F9/46
- IPC分类号: G06F9/46
摘要:
A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.
信息查询