-
公开(公告)号:US09424356B2
公开(公告)日:2016-08-23
申请号:US12964092
申请日:2010-12-09
申请人: Walter Sun , Junaid Ahmed , Yipeng Li , Peter Bailey , Nikhil Dandekar , Sasi Parthasarathy , Xin Chen , Xiao Zhang
发明人: Walter Sun , Junaid Ahmed , Yipeng Li , Peter Bailey , Nikhil Dandekar , Sasi Parthasarathy , Xin Chen , Xiao Zhang
IPC分类号: G06F17/30
CPC分类号: G06F17/30312 , G06F17/3053 , G06F17/30613 , G06F17/30864 , G06F17/30867 , G06F17/30876
摘要: Methods, systems, and computer-readable media are provided for updating a search index with new uniform resource locators (URLs) and spiking URLs with increased user interest. History data, provided from browser applications residing on users' computers that indicate URLs accessed by the users, is parsed to identify new/previously unknown URLs. The history data also indicates URLs in which there is increased interest based on a number of recent hits as compared to an average number of hits determined over time. Author postings of new URLs to social networking sites and a quality rating of the authors may also be used to identify and filter new URLs. Search indexes are updated with the new and spiking URL data. As such, lag time between posting of new URLs and spiking of URL interest and inclusion of this data in a search index is greatly decreased.
摘要翻译: 提供了方法,系统和计算机可读介质,用于使用新的统一资源定位符(URL)更新搜索索引和增加用户兴趣的加标URL。 由驻留在用户计算机上的用于指示用户访问的URL的浏览器应用程序提供的历史数据将被解析,以识别新的/以前未知的URL。 与根据随时间确定的平均击球次数相比,历史数据还指示基于最近命中数的兴趣增加的URL。 社交网站的新URL的作者发布和作者的品质评级也可用于识别和过滤新的URL。 搜索索引将使用新的和加标的URL数据进行更新。 因此,新的URL发布之间的滞后时间和URL兴趣的尖峰以及将这些数据包含在搜索索引中的时间大大降低。
-
公开(公告)号:US08255385B1
公开(公告)日:2012-08-28
申请号:US13053772
申请日:2011-03-22
申请人: Walter Sun , Yipeng Li , Xiao Zhang , Junaid Ahmed
发明人: Walter Sun , Yipeng Li , Xiao Zhang , Junaid Ahmed
CPC分类号: G06F17/3089
摘要: Methods and systems for determining an adaptive crawl rate for a Web crawler based on historical publication data from a Web source are provided. A frequency of publication of the Web source is determined over a specified period of time, and an adaptive crawl rate is calculated using the frequency of publication. The Web crawler is then deployed at the calculated adaptive crawl rate.
摘要翻译: 提供了基于Web源的历史发布数据来确定Web爬虫的自适应爬网速率的方法和系统。 在指定的时间段内确定Web源的发布频率,并使用发布频率计算自适应爬网速率。 然后以计算的自适应爬网率部署Web爬虫。
-