OPTIMIZING WEB CRAWLING WITH USER HISTORY
    1.
    发明申请
    OPTIMIZING WEB CRAWLING WITH USER HISTORY 有权
    用户历史优化网络抓取

    公开(公告)号:US20130041881A1

    公开(公告)日:2013-02-14

    申请号:US13206256

    申请日:2011-08-09

    IPC分类号: G06F7/00 G06F17/30

    摘要: A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.

    摘要翻译: 礼貌经理根据由客户端网络浏览器上的插件或工具栏生成和发送的历史记录数据来估计网站流量。 历史日志数据详细说明了网页浏览器访问不同网站的日期和时间,这些网站用于了解哪些时间段特定网站正在忙碌以及哪些时间段的网站不忙碌。 根据历史日志数据确定网站的不同时间范围的抓取率,并根据爬网率来抓取网页抓取工具,以尽量减少网络爬网程序请求对网站崩溃负责的机会。