GENERATING SITE MAPS
    1.
    发明申请
    GENERATING SITE MAPS 审中-公开
    生成站点MAPS

    公开(公告)号:US20110093533A1

    公开(公告)日:2011-04-21

    申请号:US12988078

    申请日:2008-04-17

    IPC分类号: G06F15/173 G06F15/16

    CPC分类号: G06F16/972 G06F16/958

    摘要: Methods, systems, and apparatus, including computer program products, for generating sitemaps. The method includes scanning network traffic between a server and one or more clients requesting resources from the server, the network traffic including resource request messages from the one or more clients and resources served by the server in response to the resource request messages. The method also includes automatically extracting data from the traffic served by the server to the one or more clients, the extracted data including one or more Uniform Resource Locators that identify the resources served by the server to the one or more clients. The method automatically generates a sitemap from the extracted data, and stores the sitemap in a computer-readable memory.

    摘要翻译: 用于生成站点地图的方法,系统和设备,包括计算机程序产品。 该方法包括扫描服务器与从服务器请求资源的一个或多个客户端之间的网络流量,网络流量包括来自一个或多个客户端的资源请求消息和由服务器响应于资源请求消息而服务的资源。 该方法还包括从服务器向一个或多个客户端服务的流量自动提取数据,所提取的数据包括一个或多个统一资源定位符,其将服务器所服务的资源标识给一个或多个客户端。 该方法会自动从提取的数据生成站点地图,并将该站点地图存储在计算机可读存储器中。

    Sitemap generating client for web crawler
    3.
    发明授权
    Sitemap generating client for web crawler 有权
    网站地图生成网页抓取工具的客户端

    公开(公告)号:US08037055B2

    公开(公告)日:2011-10-11

    申请号:US12861663

    申请日:2010-08-23

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a sitemap generating client for web crawlers are described. The client accesses one or more sources of document information about the documents available on a website, such as the file system, access logs, or pre-made URL lists. Document information is extracted from the sources and one or more sitemaps are generated based on the extracted document information. A notification is transmitted to a remote computer, informing that the sitemap(s) are available for access and likely have been updated. If the remote computer is associated with a web crawler, the remote computer may access the sitemap(s) and use the sitemaps to schedule a crawl of documents included or available on the website.

    摘要翻译: 描述用于网页抓取工具的网站地图生成客户端的方法和系统。 客户端访问关于网站上可用的文档的文档信息的一个或多个来源,例如文件系统,访问日志或预先制作的URL列表。 从源中提取文档信息,并且基于提取的文档信息生成一个或多个站点地图。 通知被发送到远程计算机,通知站点地图可用于访问并且可能已被更新。 如果远程计算机与网络爬虫相关联,则远程计算机可以访问站点地图,并使用站点地图来安排在网站上包含或可用的文档的爬网。

    Web crawler scheduler that utilizes sitemaps from websites
    4.
    发明授权
    Web crawler scheduler that utilizes sitemaps from websites 有权
    Web爬网程序调度程序利用网站的站点地图

    公开(公告)号:US08037054B2

    公开(公告)日:2011-10-11

    申请号:US12823358

    申请日:2010-06-25

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a web crawler scheduler that utilizes sitemaps from websites are described. A web crawler scheduling system receives a notification from a website or web server. In response to the notification, the system accesses one or more sitemap(s) for documents associated with the website or web server. The system schedules crawls of the documents based on information identified from the sitemaps. The system crawls at least a subset of the documents scheduled for crawling.

    摘要翻译: 描述了利用网站中的站点地图的Web爬网调度程序的方法和系统。 网页抓取器调度系统从网站或网络服务器接收通知。 响应于通知,系统访问与网站或Web服务器相关联的文档的一个或多个站点地图。 系统根据从站点地图中识别的信息调度文档的爬取。 系统至少抓取一些计划进行爬网的文档的一部分。

    Web Crawler Scheduler that Utilizes Sitemaps from Websites
    5.
    发明申请
    Web Crawler Scheduler that Utilizes Sitemaps from Websites 有权
    使用网站站点地图的Web爬虫计划程序

    公开(公告)号:US20100262592A1

    公开(公告)日:2010-10-14

    申请号:US12823358

    申请日:2010-06-25

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a web crawler scheduler that utilizes sitemaps from websites are described. A web crawler scheduling system receives a notification from a website or web server. In response to the notification, the system accesses one or more sitemap(s) for documents associated with the website or web server. The system schedules crawls of the documents based on information identified from the sitemaps. The system crawls at least a subset of the documents scheduled for crawling.

    摘要翻译: 描述了利用网站中的站点地图的Web爬网调度程序的方法和系统。 网页抓取器调度系统从网站或网络服务器接收通知。 响应于通知,系统访问与网站或Web服务器相关联的文档的一个或多个站点地图。 系统根据从站点地图中识别的信息调度文档的爬取。 系统至少抓取一些计划进行爬网的文档的一部分。

    Web Crawler Scheduler that Utilizes Sitemaps from Websites
    7.
    发明申请
    Web Crawler Scheduler that Utilizes Sitemaps from Websites 有权
    使用网站站点地图的Web爬虫计划程序

    公开(公告)号:US20120036118A1

    公开(公告)日:2012-02-09

    申请号:US13271160

    申请日:2011-10-11

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a web crawler scheduler that utilizes sitemaps from websites are described. A web crawler scheduling system receives a notification from a website or web server. In response to the notification, the system accesses one or more sitemap(s) for documents associated with the website or web server. The system schedules crawls of the documents based on information identified from the sitemaps. The system crawls at least a subset of the documents scheduled for crawling.

    摘要翻译: 描述了利用网站中的站点地图的Web爬网调度程序的方法和系统。 网页抓取器调度系统从网站或网络服务器接收通知。 响应于通知,系统访问与网站或Web服务器相关联的文档的一个或多个站点地图。 系统根据从站点地图中识别的信息调度文档的爬取。 系统至少抓取一些计划进行爬网的文档的一部分。

    Sitemap Generating Client for Web Crawler
    8.
    发明申请
    Sitemap Generating Client for Web Crawler 有权
    Sitemap生成Web爬网程序的客户端

    公开(公告)号:US20100318508A1

    公开(公告)日:2010-12-16

    申请号:US12861663

    申请日:2010-08-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a sitemap generating client for web crawlers are described. The client accesses one or more sources of document information about the documents available on a website, such as the file system, access logs, or pre-made URL lists. Document information is extracted from the sources and one or more sitemaps are generated based on the extracted document information. A notification is transmitted to a remote computer, informing that the sitemap(s) are available for access and likely have been updated. If the remote computer is associated with a web crawler, the remote computer may access the sitemap(s) and use the sitemaps to schedule a crawl of documents included or available on the website.

    摘要翻译: 描述用于网页抓取工具的网站地图生成客户端的方法和系统。 客户端访问关于网站上可用的文档的文档信息的一个或多个来源,例如文件系统,访问日志或预先制作的URL列表。 从源中提取文档信息,并且基于提取的文档信息生成一个或多个站点地图。 通知被发送到远程计算机,通知站点地图可用于访问并且可能已被更新。 如果远程计算机与网络爬虫相关联,则远程计算机可以访问站点地图,并使用站点地图来安排在网站上包含或可用的文档的爬网。