Data-discriminate search engine updates
    1.
    发明授权
    Data-discriminate search engine updates 有权
    数据区分搜索引擎更新

    公开(公告)号:US08838571B2

    公开(公告)日:2014-09-16

    申请号:US12825301

    申请日:2010-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for data-discriminate search engine updates, where, in accordance with a first crawling session frequency associated with a first update type, a search engine index is updated by recording an update to a first set of data, where the update to the first set of data is of the first update type, and, in accordance with a second crawling session frequency associated with a second update type, the search engine index is updated by recording an update to a second set of data, where the update to the second set of data is of the second update type, where the first crawling session frequency is of a different frequency than the second crawling session frequency.

    摘要翻译: 提供用于数据鉴别搜索引擎更新的技术,其中根据与第一更新类型相关联的第一爬行会话频率,通过将更新记录到第一组数据来更新搜索引擎索引,其中更新到 第一组数据是第一更新类型,并且根据与第二更新类型相关联的第二爬行会话频率,通过将更新记录到第二组数据来更新搜索引擎索引,其中更新为 第二组数据是第二更新类型,其中第一爬行会话频率具有与第二爬行会话频率不同的频率。

    Incremental crawling of multiple content providers using aggregation
    2.
    发明授权
    Incremental crawling of multiple content providers using aggregation 有权
    使用聚合增量爬取多个内容提供商

    公开(公告)号:US08799261B2

    公开(公告)日:2014-08-05

    申请号:US12343009

    申请日:2008-12-23

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method for incremental crawling of content stored on a plurality of content providers using aggregation is provided. The method comprises receiving a request to crawl content on one or more associated content providers; retrieving one or more first references to content on a first content provider; retrieving one or more second references to content on one or more second content providers during the same request; aggregating the first and second references; and returning the aggregated first and second references. This is done while taking into consideration opaque timestamp object which is managed in a distributed manner. The opaque timestamp is filled in by the content providers but stored in the crawler side between crawling sessions.

    摘要翻译: 提供了使用聚合来增加爬取存储在多个内容提供商上的内容的方法。 该方法包括接收在一个或多个相关联的内容提供者上爬取内容的请求; 在第一内容提供商上检索对内容的一个或多个第一引用; 在同一请求期间,在一个或多个第二内容提供者上检索一个或多个第二次引用内容; 聚合第一和第二参考文献; 并返回汇总的第一和第二个引用。 这是在考虑以分布式方式管理的不透明时间戳对象的情况下完成的。 不透明时间戳由内容提供者填写,但存储在爬网会话之间的抓取器端。

    Search engine with privacy protection
    3.
    发明授权
    Search engine with privacy protection 有权
    具有隐私保护的搜索引擎

    公开(公告)号:US09224007B2

    公开(公告)日:2015-12-29

    申请号:US12559720

    申请日:2009-09-15

    IPC分类号: G06F17/30 G06F21/62 G06F21/84

    摘要: A search engine system with privacy protection, including a data indexer configured to create an index of data, a search engine configured to search the index of the data in response to a query, and create a search result set including excerpts from the data, and a privacy protector configured to identify at least one data entity within at least one excerpt of the search result set that meets at least one predefined entity extraction criterion, redact the search result set by removing the data entity from the excerpt, and present the redacted search result set on a computer output device.

    摘要翻译: 一种具有隐私保护的搜索引擎系统,包括被配置为创建数据索引的数据索引器,配置成响应于查询来搜索数据的索引的搜索引擎,以及创建包括来自数据的摘录的搜索结果集,以及 隐私保护器,其被配置为识别符合至少一个预定义的实体提取标准的所述搜索结果集的至少一个摘录内的至少一个数据实体,通过从所述摘录中移除所述数据实体来修正所述搜索结果集,并且提交所述编辑的搜索 结果设置在计算机输出设备上。