Mobile sitemaps
    7.
    发明申请
    Mobile sitemaps 有权
    移动站点地图

    公开(公告)号:US20070050338A1

    公开(公告)日:2007-03-01

    申请号:US11415947

    申请日:2006-05-01

    IPC分类号: G06F17/30

    摘要: A method of analyzing documents or relationships between documents includes receiving a notification of an available metadata document containing information about one or more network-accessible documents, obtaining a document format indicator associated with the metadata document, selecting a document crawler using the document format indicator, and crawling at least some of the network-accessible documents using the selected document crawler.

    摘要翻译: 分析文档或文档之间的关系的方法包括接收包含关于一个或多个网络可访问文档的信息的可用元数据文档的通知,获得与元数据文档相关联的文档格式指示符,使用文档格式指示符选择文档搜索器, 并使用所选文档抓取工具至少抓取一些网络可访问文档。

    Web crawler scheduler that utilizes sitemaps from websites
    8.
    发明授权
    Web crawler scheduler that utilizes sitemaps from websites 有权
    Web爬网程序调度程序利用网站的站点地图

    公开(公告)号:US08417686B2

    公开(公告)日:2013-04-09

    申请号:US13271160

    申请日:2011-10-11

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a web crawler scheduler that utilizes sitemaps from websites are described. A web crawler scheduling system receives a notification from a website or web server. In response to the notification, the system accesses one or more sitemap(s) for documents associated with the website or web server. The system schedules crawls of the documents based on information identified from the sitemaps. The system crawls at least a subset of the documents scheduled for crawling.

    摘要翻译: 描述了利用网站中的站点地图的Web爬网调度程序的方法和系统。 网页抓取器调度系统从网站或网络服务器接收通知。 响应于通知,系统访问与网站或Web服务器相关联的文档的一个或多个站点地图。 系统根据从站点地图中识别的信息调度文档的爬取。 系统至少抓取一些计划进行爬网的文档的一部分。

    Sitemap generating client for web crawler
    9.
    发明授权
    Sitemap generating client for web crawler 有权
    网站地图生成网页抓取工具的客户端

    公开(公告)号:US07801881B1

    公开(公告)日:2010-09-21

    申请号:US11172692

    申请日:2005-06-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a sitemap generating client for web crawlers are described. The client accesses one or more sources of document information about the documents available on a website, such as the file system, access logs, or pre-made URL lists. Document information is extracted from the sources and one or more sitemaps are generated based on the extracted document information. A notification is transmitted to a remote computer, informing that the sitemap(s) are available for access and likely have been updated. If the remote computer is associated with a web crawler, the remote computer may access the sitemap(s) and use the sitemaps to schedule a crawl of documents included or available on the website.

    摘要翻译: 描述用于网页抓取工具的网站地图生成客户端的方法和系统。 客户端访问关于网站上可用的文档的文档信息的一个或多个来源,例如文件系统,访问日志或预先制作的URL列表。 从源中提取文档信息,并且基于提取的文档信息生成一个或多个站点地图。 通知被发送到远程计算机,通知站点地图可用于访问并且可能已被更新。 如果远程计算机与网络爬虫相关联,则远程计算机可以访问站点地图,并使用站点地图来安排在网站上包含或可用的文档的爬网。

    Sitemap generating client for web crawler
    10.
    发明授权
    Sitemap generating client for web crawler 有权
    网站地图生成网页抓取工具的客户端

    公开(公告)号:US08037055B2

    公开(公告)日:2011-10-11

    申请号:US12861663

    申请日:2010-08-23

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Methods and systems for a sitemap generating client for web crawlers are described. The client accesses one or more sources of document information about the documents available on a website, such as the file system, access logs, or pre-made URL lists. Document information is extracted from the sources and one or more sitemaps are generated based on the extracted document information. A notification is transmitted to a remote computer, informing that the sitemap(s) are available for access and likely have been updated. If the remote computer is associated with a web crawler, the remote computer may access the sitemap(s) and use the sitemaps to schedule a crawl of documents included or available on the website.

    摘要翻译: 描述用于网页抓取工具的网站地图生成客户端的方法和系统。 客户端访问关于网站上可用的文档的文档信息的一个或多个来源,例如文件系统,访问日志或预先制作的URL列表。 从源中提取文档信息,并且基于提取的文档信息生成一个或多个站点地图。 通知被发送到远程计算机,通知站点地图可用于访问并且可能已被更新。 如果远程计算机与网络爬虫相关联,则远程计算机可以访问站点地图,并使用站点地图来安排在网站上包含或可用的文档的爬网。