Document reuse in a search engine crawler

发明授权

US08707312B1 Document reuse in a search engine crawler 有权

标题翻译：搜索引擎抓取工具中的文档重用

请登陆查看更多内容

专利标题： Document reuse in a search engine crawler
专利标题（中）： 搜索引擎抓取工具中的文档重用
申请号： US10882955

申请日： 2004-06-30
公开(公告)号： US08707312B1

公开(公告)日： 2014-04-22
发明人: Huican Zhu , Maximilian Ibel , Anurag Acharya , Howard Bradley Gobioff
申请人： Huican Zhu , Maximilian Ibel , Anurag Acharya , Howard Bradley Gobioff
申请人地址： US CA Mountain View
专利权人： Google Inc.
当前专利权人： Google Inc.
当前专利权人地址： US CA Mountain View
代理机构： Morgan, Lewis & Bockius LLP
主分类号： G06F9/46
IPC分类号： G06F9/46

Document reuse in a search engine crawler

摘要：

A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.

摘要（中）：

搜索引擎搜索器包括用于确定要从其各自的主机服务器下载哪些文档的调度器。已知基于先前抓取的一个或多个记录的稳定的文档从文档存储库重新使用。在还包含文档标识符的调度器记录中设置重用标志，重用标志指示是否应该从诸如万维网的第一数据库或诸如文档存储库的第二数据库检索文档。在搜索引擎爬网程序抓取期间使用一组这样的调度程序记录来确定在检索在调度程序记录中标识的文档时要使用哪个数据库。

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F9/00	程序控制装置，例如，控制单元（用于外部设备的程序控制入G06F13/10）
G06F9/06	.应用存入的程序的，即应用处理设备的内部存储来接收程序并保持程序的
G06F9/46	..多道程序装置