Web Crawler Scheduler that Utilizes Sitemaps from Websites
    1.
    发明申请
    Web Crawler Scheduler that Utilizes Sitemaps from Websites 有权
    使用网站站点地图的Web爬虫计划程序

    公开(公告)号:US20130226898A1

    公开(公告)日:2013-08-29

    申请号:US13858872

    申请日:2013-04-08

    Applicant: Google Inc.

    CPC classification number: G06F17/30864

    Abstract: Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.

    Abstract translation: 公开了用于安排用于爬行的文档的系统和方法。 在一些实现中,一种方法包括获取多个网站的站点地图信息; 并在多个网站中分析站点地图信息以识别网站。 该网站具有至少可能过期的站点地图信息。 该方法还包括通过下载所识别的网站的更新的站点地图信息来更新所识别的网站的站点地图信息; 并根据所识别的网站的更新的站点地图信息安排用于爬行的文档。

    Web crawler scheduler that utilizes sitemaps from websites
    2.
    发明授权
    Web crawler scheduler that utilizes sitemaps from websites 有权
    Web爬网程序调度程序利用网站的站点地图

    公开(公告)号:US09002819B2

    公开(公告)日:2015-04-07

    申请号:US13858872

    申请日:2013-04-08

    Applicant: Google Inc.

    CPC classification number: G06F17/30864

    Abstract: Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.

    Abstract translation: 公开了用于安排用于爬行的文档的系统和方法。 在一些实现中,一种方法包括获取多个网站的站点地图信息; 并在多个网站中分析站点地图信息以识别网站。 该网站具有至少可能过期的站点地图信息。 该方法还包括通过下载所识别的网站的更新的站点地图信息来更新所识别的网站的站点地图信息; 并根据所识别的网站的更新的站点地图信息安排用于爬行的文档。

Patent Agency Ranking