Updating a search index using reported browser history data
    1.
    发明授权
    Updating a search index using reported browser history data 有权
    使用报告的浏览器历史数据更新搜索索引

    公开(公告)号:US09424356B2

    公开(公告)日:2016-08-23

    申请号:US12964092

    申请日:2010-12-09

    IPC分类号: G06F17/30

    摘要: Methods, systems, and computer-readable media are provided for updating a search index with new uniform resource locators (URLs) and spiking URLs with increased user interest. History data, provided from browser applications residing on users' computers that indicate URLs accessed by the users, is parsed to identify new/previously unknown URLs. The history data also indicates URLs in which there is increased interest based on a number of recent hits as compared to an average number of hits determined over time. Author postings of new URLs to social networking sites and a quality rating of the authors may also be used to identify and filter new URLs. Search indexes are updated with the new and spiking URL data. As such, lag time between posting of new URLs and spiking of URL interest and inclusion of this data in a search index is greatly decreased.

    摘要翻译: 提供了方法,系统和计算机可读介质,用于使用新的统一资源定位符(URL)更新搜索索引和增加用户兴趣的加标URL。 由驻留在用户计算机上的用于指示用户访问的URL的浏览器应用程序提供的历史数据将被解析,以识别新的/以前未知的URL。 与根据随时间确定的平均击球次数相比,历史数据还指示基于最近命中数的兴趣增加的URL。 社交网站的新URL的作者发布和作者的品质评级也可用于识别和过滤新的URL。 搜索索引将使用新的和加标的URL数据进行更新。 因此,新的URL发布之间的滞后时间和URL兴趣的尖峰以及将这些数据包含在搜索索引中的时间大大降低。

    Adaptive crawl rates based on publication frequency
    2.
    发明授权
    Adaptive crawl rates based on publication frequency 有权
    基于出版频率的自适应爬网率

    公开(公告)号:US08255385B1

    公开(公告)日:2012-08-28

    申请号:US13053772

    申请日:2011-03-22

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3089

    摘要: Methods and systems for determining an adaptive crawl rate for a Web crawler based on historical publication data from a Web source are provided. A frequency of publication of the Web source is determined over a specified period of time, and an adaptive crawl rate is calculated using the frequency of publication. The Web crawler is then deployed at the calculated adaptive crawl rate.

    摘要翻译: 提供了基于Web源的历史发布数据来确定Web爬虫的自适应爬网速率的方法和系统。 在指定的时间段内确定Web源的发布频率,并使用发布频率计算自适应爬网速率。 然后以计算的自适应爬网率部署Web爬虫。