Library citation integration
    31.
    发明授权
    Library citation integration 有权
    图书馆引文整合

    公开(公告)号:US07526475B1

    公开(公告)日:2009-04-28

    申请号:US11432039

    申请日:2006-05-10

    IPC分类号: G06F7/00 G06F17/00

    摘要: An online search system generates an index of documents using index information received from a library. Some documents have restricted access; some documents may not be available online. The search system provides links to documents in the library as well as other sites based on a search, and may include link resolvers received from the library. The search system provides access links to the link resolvers if an identifier, such as a user identification or IP address, matches an affiliation list from the library.

    摘要翻译: 在线搜索系统使用从库接收的索引信息生成文档索引。 有些文件限制访问; 一些文件可能无法在线上。 搜索系统提供到图书馆中的文档以及基于搜索的其他站点的链接,并且可以包括从图书馆接收的链接解析器。 如果诸如用户标识或IP地址的标识符与来自库的隶属关系列表匹配,则搜索系统提供到链接解析器的访问链接。

    Assigning document identification tags
    33.
    发明授权
    Assigning document identification tags 有权
    分配文件识别标签

    公开(公告)号:US09411889B2

    公开(公告)日:2016-08-09

    申请号:US13419349

    申请日:2012-03-13

    摘要: Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.

    摘要翻译: 文件识别标签被分配给要添加到文档集合的文档。 基于与新文档的查询无关信息,文档识别标签被分配给新文档。 所分配的文档识别标签用于新文档的索引。 当响应于查询而由索引产生文档识别标签的列表时,该列表关于与查询无关的相关度的度量近似排序。 在一些实施例中,与查询无关的相关性的度量与万维网的连接矩阵相关。 在其他实施例中,该度量与爬行的新近相关。 在其他实施方案中,测量是这两者的混合物。 所提供的系统和方法允许在从文档集合中爬取时对文档进行实时索引。

    Generating equivalence classes and rules for associating content with document identifiers
    34.
    发明授权
    Generating equivalence classes and rules for associating content with document identifiers 有权
    生成用于将内容与文档标识符相关联的等价类和规则

    公开(公告)号:US09026566B2

    公开(公告)日:2015-05-05

    申请号:US12725381

    申请日:2010-03-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A system of reducing the possibility of crawling duplicate document identifiers partitions a plurality of document identifiers into multiple clusters, each cluster having a cluster name and a set of document parameters. The system generates an equivalence rule for each cluster of document identifiers, the rule specifying which document parameters associated with the cluster are content-relevant. Next, the system groups each cluster of document identifiers into one or more equivalence classes in accordance with its associated equivalence rule, each equivalence class including one or more document identifiers that correspond to a document content and having a representative document identifier identifying the document content.

    摘要翻译: 减少爬行重复文档标识符的可能性的系统将多个文档标识符划分成多个集群,每个集群具有集群名称和一组文档参数。 系统为文档标识符的每个集群生成等价规则,该规则指定与集群相关联的文档参数与内容相关。 接下来,系统根据其相关联的等价规则将每个文档标识符簇分组为一个或多个等价类,每个等价类包括与文档内容对应的一个或多个文档标识符,并且具有标识文档内容的代表性文档标识符。

    Systems and methods for personalizing aggregated news content
    35.
    发明授权
    Systems and methods for personalizing aggregated news content 有权
    个性化聚合新闻内容的系统和方法

    公开(公告)号:US08676837B2

    公开(公告)日:2014-03-18

    申请号:US10748663

    申请日:2003-12-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: A system customizes a news document associated with a user of a news aggregation service. The system includes multiple news source servers that store news content and a remote news aggregation server. The news aggregation server creates a customized news document based on one or more personalized search queries received from a user. The news aggregation server fetches the news content from the multiple news source servers, aggregates the news content, and searches the aggregated news content based on the one or more personalized search queries. The news aggregation server provides selected news content to the customized news document based on results of the search.

    摘要翻译: 系统自定义与新闻聚合服务的用户相关联的新闻文档。 该系统包括存储新闻内容的多个新闻源服务器和远程新闻聚合服务器。 新闻聚合服务器基于从用户接收的一个或多个个性化搜索查询创建定制的新闻文档。 新闻聚合服务器从多个新闻源服务器获取新闻内容,聚合新闻内容,并根据一个或多个个性化搜索查询搜索聚合新闻内容。 新闻聚合服务器根据搜索结果向定制的新闻文档提供选定的新闻内容。

    Document search in affiliated libraries
    36.
    发明授权
    Document search in affiliated libraries 有权
    在附属图书馆进行文件搜索

    公开(公告)号:US08473487B1

    公开(公告)日:2013-06-25

    申请号:US12419872

    申请日:2009-04-07

    IPC分类号: G06F17/30

    摘要: An online search system generates an index of documents using index information received from a library. Some documents have restricted access; some documents may not be available online. The search system provides links to documents in the library as well as other sites based on a search, and may include link resolvers received from the library. The search system provides access links to the link resolvers if an identifier, such as a user identification or IP address, matches an affiliation list from the library.

    摘要翻译: 在线搜索系统使用从库接收的索引信息生成文档索引。 有些文件限制访问; 一些文件可能无法在线上。 搜索系统提供到图书馆中的文档以及基于搜索的其他站点的链接,并且可以包括从图书馆接收的链接解析器。 如果诸如用户标识或IP地址的标识符与来自库的隶属关系列表匹配,则搜索系统提供到链接解析器的访问链接。

    Search Engine Cache Control
    37.
    发明申请
    Search Engine Cache Control 有权
    搜索引擎缓存控制

    公开(公告)号:US20110035372A1

    公开(公告)日:2011-02-10

    申请号:US12905922

    申请日:2010-10-15

    IPC分类号: G06F17/30

    CPC分类号: G06F12/0875

    摘要: A search query containing one or more terms is received from a client system. In response to receiving the search query, one or more snippets obtained in response to a prior execution of said search query are requested from a cache. For a respective snippet received from the cache, it is determined whether the respective snippet is a current version. In response to a determination that the respective snippet is not the current version, the current version of the respective snippet is obtained from a corresponding document in which one or more terms from said search query are located and the snippet stored in the cache is updated using the obtained current version. Search query results including the respective snippet are transmitted to the client.

    摘要翻译: 从客户端系统接收到包含一个或多个术语的搜索查询。 响应于接收到搜索查询,响应于先前执行所述搜索查询获得的一个或多个片段从高速缓存请求。 对于从高速缓存接收到的相应片段,确定相应的片段是否是当前版本。 响应于相应片段不是当前版本的确定,从相应的文档获得当前版本的相应片段,其中来自所述搜索查询的一个或多个术语位于并且存储在高速缓存中的片段使用 获得的当前版本。 搜索查询结果,包括相应的代码段被传送到客户端。

    System for automatically managing duplicate documents when crawling dynamic documents
    38.
    发明授权
    System for automatically managing duplicate documents when crawling dynamic documents 有权
    抓取动态文档时自动管理重复文件的系统

    公开(公告)号:US07680773B1

    公开(公告)日:2010-03-16

    申请号:US11097687

    申请日:2005-03-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A system of reducing the possibility of crawling duplicate document identifiers partitions a plurality of document identifiers into multiple clusters, each cluster having a cluster name and a set of document parameters. The system generates an equivalence rule for each cluster of document identifiers, the rule specifying which document parameters associated with the cluster are content-relevant. Next, the system groups each cluster of document identifiers into one or more equivalence classes in accordance with its associated equivalence rule, each equivalence class including one or more document identifiers that correspond to a document content and having a representative document identifier identifying the document content.

    摘要翻译: 减少爬行重复文档标识符的可能性的系统将多个文档标识符划分成多个集群,每个集群具有集群名称和一组文档参数。 系统为文档标识符的每个集群生成等价规则,该规则指定与集群相关联的文档参数与内容相关。 接下来,系统根据其相关联的等价规则将每个文档标识符簇分组为一个或多个等价类,每个等价类包括与文档内容对应的一个或多个文档标识符,并且具有标识文档内容的代表性文档标识符。

    Systems and methods for automatic repair and replacement of networked machines
    39.
    发明授权
    Systems and methods for automatic repair and replacement of networked machines 有权
    网络机器自动维修和更换的系统和方法

    公开(公告)号:US07302608B1

    公开(公告)日:2007-11-27

    申请号:US10816594

    申请日:2004-03-31

    IPC分类号: G06F11/00

    CPC分类号: G06F11/2041 G06F11/2023

    摘要: Systems and methods for automatic repair and replacement of computing machines are disclosed. The system may generally include a database including configuration information for the available replacement machines and a failed machine, a machine assignment module to assign a replacement machine based on a comparison of the configuration information for the failed machine and the available replacement machines, and a configuration module for generating configuration data for replacement of the failed machine with the replacement machine in the computer network. The machine assignment module may compare certain configuration parameters such as processor speed, disk drive size, and/or amount of RAM, between the failed machine and the available replacement machines. A replacement module may copy data from another copy of the failed machine in the computer network into the replacement machine. An installation module may install the configuration data in, e.g., dependent machines, and restart the dependent machines.

    摘要翻译: 公开了用于自动修复和更换计算机的系统和方法。 该系统通常可以包括数据库,该数据库包括可用替代机器的配置信息和故障机器,基于故障机器的配置信息与可用替换机器的比较来分配替换机器的机器分配模块以及配置 用于使用计算机网络中的替换机器生成用于替换故障机器的配置数据的模块。 机器分配模块可以在故障机器和可用的替换机器之间比较某些配置参数,例如处理器速度,磁盘驱动器大小和/或RAM量。 更换模块可能会将计算机网络中故障机器的另一个副本的数据复制到更换机器中。 安装模块可以将配置数据安装在例如从属机器中,并重新启动依赖机器。

    Document reuse in a search engine crawler
    40.
    发明授权
    Document reuse in a search engine crawler 有权
    搜索引擎抓取工具中的文档重用

    公开(公告)号:US08707312B1

    公开(公告)日:2014-04-22

    申请号:US10882955

    申请日:2004-06-30

    IPC分类号: G06F9/46

    CPC分类号: G06F17/30864

    摘要: A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.

    摘要翻译: 搜索引擎搜索器包括用于确定要从其各自的主机服务器下载哪些文档的调度器。 已知基于先前抓取的一个或多个记录的稳定的文档从文档存储库重新使用。 在还包含文档标识符的调度器记录中设置重用标志,重用标志指示是否应该从诸如万维网的第一数据库或诸如文档存储库的第二数据库检索文档。 在搜索引擎爬网程序抓取期间使用一组这样的调度程序记录来确定在检索在调度程序记录中标识的文档时要使用哪个数据库。