Invention Grant
US08862569B2 Method and techniques for determining crawling schedule 有权
确定爬行时间表的方法和技术

  • Patent Title: Method and techniques for determining crawling schedule
  • Patent Title (中): 确定爬行时间表的方法和技术
  • Application No.: US13348438
    Application Date: 2012-01-11
  • Publication No.: US08862569B2
    Publication Date: 2014-10-14
  • Inventor: Cheng XuQiying LinXin Li
  • Applicant: Cheng XuQiying LinXin Li
  • Applicant Address: US CA Mountain View
  • Assignee: Google Inc.
  • Current Assignee: Google Inc.
  • Current Assignee Address: US CA Mountain View
  • Main IPC: G06F17/30
  • IPC: G06F17/30
Method and techniques for determining crawling schedule
Abstract:
Methods, systems and computer-readable storage medium for determining a crawling schedule. In an aspect, a method includes obtaining crawl history data for a Web site having Web pages, determining a status of the Web pages, determining a total quantity of Web pages that have a status of deleted, calculating a probability that another Web page of the Web site will be removed based on the total quantity, and storing data associating the calculated probability with the Web site. The method can further include determining, for a plurality of sets of the previous time periods, a respective crawl penalty as a combination of a penalty for crawling the Web site and a penalty for showing a deleted Web page based on the calculated probability, and determining a re-crawl schedule based on the crawl penalties.
Public/Granted literature
Information query
Patent Agency Ranking
0/0