Collaborative team crawling:Large scale information gathering over the internet

发明授权

US06182085B2 Collaborative team crawling:Large scale information gathering over the internet 失效

标题翻译：协作小组爬行：通过互联网收集大量信息

请登陆查看更多内容

专利标题： Collaborative team crawling:Large scale information gathering over the internet
专利标题（中）： 协作小组爬行：通过互联网收集大量信息
申请号： US09086379

申请日： 1998-05-28
公开(公告)号： US06182085B2

公开(公告)日： 2001-01-30
发明人: Matthias Eichstaedt , Daniel Alexander Ford , Tobin Jon Lehman , Qi Lu , Shang-Hua Teng
申请人： Matthias Eichstaedt , Daniel Alexander Ford , Tobin Jon Lehman , Qi Lu , Shang-Hua Teng
主分类号： G06F1730
IPC分类号： G06F1730

Collaborative team crawling:Large scale information gathering over the internet

摘要：

A distributed collection of web-crawlers to gather information over a large portion of the cyberspace. These crawlers share the overall crawling through a cyberspace partition scheme. They also collaborate with each other through load balancing to maximally utilize the computing resources of each of the crawlers. The invention takes advantage of the hierarchical nature of the cyberspace namespace and uses the syntactic components of the URL structure as the main vehicle for dividing and assigning crawling workload to individual crawler. The partition scheme is completely distributed in which each crawler makes the partitioning decision based on its own crawling status and a globally replicated partition tree data structure.

摘要（中）：

分布式的网络爬虫收集器，用于在大量网络空间中收集信息。这些爬虫共享通过网络空间分区方案的整体爬网。他们还通过负载平衡相互协作，最大限度地利用每个爬虫的计算资源。本发明利用了网络空间命名空间的层次性，并将URL结构的句法组件作为将抓取工作量分配给各个爬虫的主要工具。分区方案是完全分布的，其中每个爬行器根据其自身的爬行状态和全局复制的分区树数据结构进行分区决定。

信息查询

Espacenet