发明授权
US06182085B2 Collaborative team crawling:Large scale information gathering over the internet 失效
协作小组爬行:通过互联网收集大量信息

Collaborative team crawling:Large scale information gathering over the internet
摘要:
A distributed collection of web-crawlers to gather information over a large portion of the cyberspace. These crawlers share the overall crawling through a cyberspace partition scheme. They also collaborate with each other through load balancing to maximally utilize the computing resources of each of the crawlers. The invention takes advantage of the hierarchical nature of the cyberspace namespace and uses the syntactic components of the URL structure as the main vehicle for dividing and assigning crawling workload to individual crawler. The partition scheme is completely distributed in which each crawler makes the partitioning decision based on its own crawling status and a globally replicated partition tree data structure.
信息查询
0/0