-
1.
公开(公告)号:US08799261B2
公开(公告)日:2014-08-05
申请号:US12343009
申请日:2008-12-23
申请人: Batya Kenig , Constantin Radchenko , Eitan Shapiro
发明人: Batya Kenig , Constantin Radchenko , Eitan Shapiro
CPC分类号: G06F17/30864
摘要: A method for incremental crawling of content stored on a plurality of content providers using aggregation is provided. The method comprises receiving a request to crawl content on one or more associated content providers; retrieving one or more first references to content on a first content provider; retrieving one or more second references to content on one or more second content providers during the same request; aggregating the first and second references; and returning the aggregated first and second references. This is done while taking into consideration opaque timestamp object which is managed in a distributed manner. The opaque timestamp is filled in by the content providers but stored in the crawler side between crawling sessions.
摘要翻译: 提供了使用聚合来增加爬取存储在多个内容提供商上的内容的方法。 该方法包括接收在一个或多个相关联的内容提供者上爬取内容的请求; 在第一内容提供商上检索对内容的一个或多个第一引用; 在同一请求期间,在一个或多个第二内容提供者上检索一个或多个第二次引用内容; 聚合第一和第二参考文献; 并返回汇总的第一和第二个引用。 这是在考虑以分布式方式管理的不透明时间戳对象的情况下完成的。 不透明时间戳由内容提供者填写,但存储在爬网会话之间的抓取器端。