Index partition maintenance over monotonically addressed document sequences
    1.
    发明授权
    Index partition maintenance over monotonically addressed document sequences 有权
    索引分区维护通过单调寻址的文档序列

    公开(公告)号:US08738673B2

    公开(公告)日:2014-05-27

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。

    INDEX PARTITION MAINTENANCE OVER MONOTONICALLY ADDRESSED DOCUMENT SEQUENCES
    4.
    发明申请
    INDEX PARTITION MAINTENANCE OVER MONOTONICALLY ADDRESSED DOCUMENT SEQUENCES 有权
    索引分割维护在单个寻址的文档序列中

    公开(公告)号:US20120059823A1

    公开(公告)日:2012-03-08

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。

    Efficient multifaceted search in information retrieval systems
    6.
    发明授权
    Efficient multifaceted search in information retrieval systems 失效
    在信息检索系统中进行有效的多方面搜索

    公开(公告)号:US07496568B2

    公开(公告)日:2009-02-24

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    EFFICIENT MULTIFACETED SEARCH IN INFORMATION RETRIEVAL SYSTEMS
    8.
    发明申请
    EFFICIENT MULTIFACETED SEARCH IN INFORMATION RETRIEVAL SYSTEMS 失效
    在信息检索系统中进行有效的多媒体搜索

    公开(公告)号:US20080133473A1

    公开(公告)日:2008-06-05

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/06

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    SUPPORTING SUB-DOCUMENT UPDATES AND QUERIES IN AN INVERTED INDEX
    9.
    发明申请
    SUPPORTING SUB-DOCUMENT UPDATES AND QUERIES IN AN INVERTED INDEX 审中-公开
    支持反向索引中的子文档更新和查询

    公开(公告)号:US20090228528A1

    公开(公告)日:2009-09-10

    申请号:US12043858

    申请日:2008-03-06

    IPC分类号: G06F17/30

    CPC分类号: G06F16/319

    摘要: A system, method, and computer program product for updating a partitioned index of a dataset. A document is indexed by separating it into indexable sections, such that different ones of the indexable sections may be contained in different partitions of the partitioned index. The partitioned index is updated using an updated version of the document by updating only those sections of the index corresponding to sections of the document that have been updated in the updated version.

    摘要翻译: 一种用于更新数据集的分区索引的系统,方法和计算机程序产品。 通过将文档分离成可索引的部分来索引文档,使得不同的可索引部分可能包含在分区索引的不同分区中。 使用文档的更新版本更新分区索引,只更新对应于已更新版本中已更新的文档部分的索引部分。