Patent search ap:("Google Inc.") AND inv:"Sascha B. Brawer" Page 1

1.

发明申请
Web Crawler Scheduler that Utilizes Sitemaps from Websites 有权
Title translation: 使用网站站点地图的Web爬虫计划程序

公开(公告)号：US20130226898A1

公开(公告)日：2013-08-29

申请号：US13858872

申请日：2013-04-08

Applicant: Google Inc.

Inventor： Sascha B. Brawer , Maximilian Ibel , Ralph Michael Keller , Narayanan Shivakumar

IPC: G06F17/30

CPC classification number: G06F17/30864

Abstract: Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.

Abstract translation: 公开了用于安排用于爬行的文档的系统和方法。在一些实现中，一种方法包括获取多个网站的站点地图信息; 并在多个网站中分析站点地图信息以识别网站。该网站具有至少可能过期的站点地图信息。该方法还包括通过下载所识别的网站的更新的站点地图信息来更新所识别的网站的站点地图信息; 并根据所识别的网站的更新的站点地图信息安排用于爬行的文档。

2.

发明授权
Web crawler scheduler that utilizes sitemaps from websites 有权
Title translation: Web爬网程序调度程序利用网站的站点地图

公开(公告)号：US09002819B2

公开(公告)日：2015-04-07

申请号：US13858872

申请日：2013-04-08

Applicant: Google Inc.

Inventor： Sascha B. Brawer , Maximilian Ibel , Ralph Michael Keller , Narayanan Shivakumar

IPC: G06F7/00 , G06F17/30

CPC classification number: G06F17/30864

Abstract: Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.

Abstract translation: 公开了用于安排用于爬行的文档的系统和方法。在一些实现中，一种方法包括获取多个网站的站点地图信息; 并在多个网站中分析站点地图信息以识别网站。该网站具有至少可能过期的站点地图信息。该方法还包括通过下载所识别的网站的更新的站点地图信息来更新所识别的网站的站点地图信息; 并根据所识别的网站的更新的站点地图信息安排用于爬行的文档。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification