Invention Grant
US07966337B2 System and method for prioritizing websites during a webcrawling process
失效
在Web抓取过程中优先处理网站的系统和方法
- Patent Title: System and method for prioritizing websites during a webcrawling process
- Patent Title (中): 在Web抓取过程中优先处理网站的系统和方法
-
Application No.: US12143885Application Date: 2008-06-23
-
Publication No.: US07966337B2Publication Date: 2011-06-21
- Inventor: David L. Blackman , Michael Ching , Stephen Dill , Ivan Eduardo Gonzalez , Adam Marcus , Daniel Norin Meredith , Linda Anh Linh Nguyen
- Applicant: David L. Blackman , Michael Ching , Stephen Dill , Ivan Eduardo Gonzalez , Adam Marcus , Daniel Norin Meredith , Linda Anh Linh Nguyen
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Schmeiser, Olsen & Watts
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A system and method for prioritizing a fetch order of web pages. The method comprises extracting by a web crawler a set of candidate web pages to be crawled. Each web page in the set of candidate web pages is associated with a website in a computer network. A determination is made to determine if a first website score for the website is in a website score database. The first website score is associated with web pages in the set of candidate web pages if the first website score exists in the website score database. The set of candidate web pages is prioritized with respect to an associated website score for each web page in the candidate set of web pages. Content is retrieved from the set of candidate web. Hyperlinks are extracted from the content. The hyperlinks are stored in a memory unit.
Public/Granted literature
- US20080256046A1 SYSTEM AND METHOD FOR PRIORITIZING WEBSITES DURING A WEBCRAWLING PROCESS Public/Granted day:2008-10-16
Information query