Invention Grant
- Patent Title: Web crawler scheduler that utilizes sitemaps from websites
- Patent Title (中): Web爬网程序调度程序利用网站的站点地图
-
Application No.: US14606882Application Date: 2015-01-27
-
Publication No.: US09355177B2Publication Date: 2016-05-31
- Inventor: Sascha Benjamin Brawer , Max Ibel , Ralph Michael Keller , Narayanan Shivakumar
- Applicant: GOOGLE INC.
- Applicant Address: US CA Mountain View
- Assignee: Google, Inc.
- Current Assignee: Google, Inc.
- Current Assignee Address: US CA Mountain View
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
Systems and methods for scheduling documents for crawling are disclosed in which sitemap information is updated for a first website identified by a sitemap by downloading updated sitemap information for the first website and scheduling documents for crawling in accordance with the updated sitemap information for the first website. The sitemap information includes one or more sitemap indexes, where each respective sitemap index in the one or more sitemap indices includes a list of URLs corresponding to documents stored at a corresponding website in a plurality of websites, the plurality of websites including the first website, and each sitemap index in the one or more sitemap indexes includes information identifying one or more of: a last modification date of a URL in the list of URLs, a change frequency of a document specified by the URL, a document title, an authority of the document, and a priority of the document.
Public/Granted literature
- US20150242508A1 Web Crawler Scheduler that Utilizes Sitemaps from Websites Public/Granted day:2015-08-27
Information query