System and method for ranking nodes in a network
    11.
    发明申请
    System and method for ranking nodes in a network 有权
    用于对网络中的节点进行排序的系统和方法

    公开(公告)号:US20050256860A1

    公开(公告)日:2005-11-17

    申请号:US10847164

    申请日:2004-05-15

    IPC分类号: G06F17/30

    摘要: A dangling web page processing system ranks dangling web pages on the web. The system ranks dangling web pages of high quality that cannot be crawled by a crawler. In addition, the system adjusts ranks to penalize dangling web pages that return errors when links on the dangling web pages are crawled. By providing a rank for dangling web pages, the present system allows the concentration of crawling resources on those dangling web pages that have the highest rank in the uncrawled region. The system operates locally to the dangling web pages, providing efficient determination of ranks for the dangling web pages. The system explicitly discriminates against web pages on the basis of whether they point to penalty pages, i.e., pages that return an error when a link is followed. By incorporating more fine-grained information such as this into ranking, the system can improve the quality of individual search results and better manage resources for crawling.

    摘要翻译: 悬挂的网页处理系统在网络上排列悬挂的网页。 该系统排列了无法爬行的高质量的悬挂网页。 此外,系统会调整排名,以惩罚悬挂在网页上的链接被抓取时返回错误的悬挂网页。 通过提供悬挂网页的排名,目前的系统允许集中爬行资源在那些悬而未决的区域中具有最高排名的悬挂网页。 系统在本地对悬挂的网页进行操作,为悬挂的网页提供有效的排名。 该系统基于是否指向惩罚页面,即在遵循链接时返回错误的页面,明确区分网页。 通过将诸如此类的细粒度信息纳入排名,系统可以提高个人搜索结果的质量,更好地管理抓取资源。

    Efficient multifaceted search in information retrieval systems
    13.
    发明授权
    Efficient multifaceted search in information retrieval systems 失效
    在信息检索系统中进行有效的多方面搜索

    公开(公告)号:US07496568B2

    公开(公告)日:2009-02-24

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    System and method for ranking nodes in a network
    15.
    发明授权
    System and method for ranking nodes in a network 有权
    用于对网络中的节点进行排序的系统和方法

    公开(公告)号:US07251654B2

    公开(公告)日:2007-07-31

    申请号:US10847164

    申请日:2004-05-15

    IPC分类号: G06F17/30 G06F17/00

    摘要: A dangling web page processing system ranks dangling web pages on the web. The system ranks dangling web pages of high quality that cannot be crawled by a crawler. In addition, the system adjusts ranks to penalize dangling web pages that return errors when links on the dangling web pages are crawled. By providing a rank for dangling web pages, the present system allows the concentration of crawling resources on those dangling web pages that have the highest rank in the uncrawled region. The system operates locally to the dangling web pages, providing efficient determination of ranks for the dangling web pages. The system explicitly discriminates against web pages on the basis of whether they point to penalty pages, i.e., pages that return an error when a link is followed. By incorporating more fine-grained information such as this into ranking, the system can improve the quality of individual search results and better manage resources for crawling.

    摘要翻译: 悬挂的网页处理系统在网络上排列悬挂的网页。 该系统排列了无法爬行的高质量的悬挂网页。 此外,系统会调整排名,以惩罚悬挂在网页上的链接被抓取时返回错误的悬挂网页。 通过提供悬挂网页的排名,目前的系统允许集中爬行资源在那些悬而未决的区域中具有最高排名的悬挂网页。 系统在本地对悬挂的网页进行操作,为悬挂的网页提供有效的排名。 该系统基于是否指向惩罚页面,即在遵循链接时返回错误的页面,明确区分网页。 通过将诸如此类的细粒度信息纳入排名,系统可以提高个人搜索结果的质量,更好地管理抓取资源。

    Method and framework to support indexing and searching taxonomies in large scale full text indexes
    16.
    发明授权
    Method and framework to support indexing and searching taxonomies in large scale full text indexes 有权
    支持大规模全文索引分类和搜索索引的方法和框架

    公开(公告)号:US08600997B2

    公开(公告)日:2013-12-03

    申请号:US11241687

    申请日:2005-09-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30734

    摘要: A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.

    摘要翻译: 一种对位于分类法中的多个实体进行索引的系统和方法,所述实体包括术语集合,包括在索引结构中接收术语; 为一个实体建立关于定义与各个条款相关联的实体和数据的术语集的位置的实体的发布列表; 并且在包括在每个位置处的相应实体的名称的组的数据的实体的位置处索引包括在该组内的实体的组的名称。 发布列表的构建包括将术语的位置和与该术语相关联的数据存储在该术语的发布列表中的条目中。 该方法包括对包括该术语的组的名称的别名进行索引,并使用反向列表索引将数据与索引项的每次出现相关联。

    Index server architecture using tiered and sharded phrase posting lists
    19.
    发明授权
    Index server architecture using tiered and sharded phrase posting lists 有权
    索引服务器架构使用分层和分层的短语发布列表

    公开(公告)号:US08090723B2

    公开(公告)日:2012-01-03

    申请号:US12716008

    申请日:2010-03-02

    IPC分类号: G06F7/00

    摘要: An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

    摘要翻译: 信息检索系统使用短语来索引,检索,组织和描述文档。 短语从文档集中提取。 文件根据所包含的短语索引,使用短语发布列表。 短语发布列表存储在索引服务器的集群中。 短语列表可以分组成分组,并分成分区。 查询中的短语是根据可能的短语来确定的。 从短语中创建基于短语的查询调度,然后进行优化,以减少查询处理和通信成本。 管理查询调度的执行以进一步减少或消除索引服务器中的各个查询处理操作。

    PRODUCTIVE DISTRIBUTION FOR RESULT OPTIMIZATION WITHIN A HIERARCHICAL ARCHITECTURE
    20.
    发明申请
    PRODUCTIVE DISTRIBUTION FOR RESULT OPTIMIZATION WITHIN A HIERARCHICAL ARCHITECTURE 审中-公开
    在分层结构中进行结果优化的生产分配

    公开(公告)号:US20100318516A1

    公开(公告)日:2010-12-16

    申请号:US12609788

    申请日:2009-10-30

    IPC分类号: G06F17/30 G06F15/18 G06N5/02

    CPC分类号: G06F16/2471 G06F16/2246

    摘要: A producer node may be included in a hierarchical, tree-shaped processing architecture, the architecture including at least one distributor node configured to distribute queries within the architecture, including distribution to the producer node and at least one other producer node within a predefined subset of producer nodes. The distributor node may be further configured to receive results from the producer node and results from the at least one other producer node and to output compiled results therefrom. The producer node may include a query pre-processor configured to process a query received from the distributor node to obtain a query representation using query features compatible with searching a producer index associated with the producer node to thereby obtain the results from the producer node, and a query classifier configured to input the query representation and output a prediction, based thereon, as to whether processing of the query by the at least one other producer node within the predefined subset of producer nodes will cause results of the at least one other producer node to be included within the compiled results.

    摘要翻译: 生成器节点可以被包括在分层的树形处理架构中,所述架构包括被配置为在所述架构内分发查询的至少一个分配器节点,包括向所述生成器节点和所述生成器节点内的至少一个其他生成器节点 生产者节点。 分配器节点可以被进一步配置成从生成器节点接收结果,并且从至少一个其他生成器节点得到结果,并从其输出编译结果。 生成器节点可以包括查询预处理器,其被配置为处理从分发者节点接收的查询,以使用与搜索与生成器节点相关联的生成器索引兼容的查询特征获得查询表示,从而从生成器节点获得结果, 一种查询分类器,被配置为输入查询表示并基于此输出预测,即关于在生成器节点的预定义子集内的至少一个其他生成器节点的查询的处理是否会导致至少一个其他生成器节点的结果 包含在编译结果中。