发明申请
- 专利标题: MICROHUBS AND ITS APPLICATIONS
- 专利标题(中): MICROHUBS及其应用
-
申请号: US12348336申请日: 2009-01-05
-
公开(公告)号: US20090119291A1公开(公告)日: 2009-05-07
- 发明人: Srinivasan Balasubramanian , Michael Ching , Piyoosh Jalan , Satish C. Penmetsa , Andrew S. Tomkins
- 申请人: Srinivasan Balasubramanian , Michael Ching , Piyoosh Jalan , Satish C. Penmetsa , Andrew S. Tomkins
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
公开/授权文献
- US08041705B2 Microhubs and its applications 公开/授权日:2011-10-18
信息查询