Virtual cursors for XML joins
    12.
    发明授权
    Virtual cursors for XML joins 有权
    XML连接的虚拟游标

    公开(公告)号:US07685138B2

    公开(公告)日:2010-03-23

    申请号:US11270784

    申请日:2005-11-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30935

    摘要: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.

    摘要翻译: 一种使用虚拟游标来提高XML查询处理效率的系统,方法和计算机程序产品。 结构连接是XML查询处理中的基本操作,并且基于索引的算法存在大量工作来执行它们。 两个众所周知的索引特征 - 路径索引和祖先信息 - 以一种新颖的方式组合,以用至少一些物理索引光标替换虚拟光标的结构连接。 虚拟光标的位置是从物理光标的路径和祖先信息导出的。 虚拟光标可以很容易地并入到现有的结构连接算法中。 通过消除索引I / O和处理物理反转列表的处理成本,虚拟游标可以将整体路径查询的性能提高一个数量级或更多。

    Method and system for filtering of information entities
    14.
    发明授权
    Method and system for filtering of information entities 失效
    信息实体过滤方法和系统

    公开(公告)号:US06996572B1

    公开(公告)日:2006-02-07

    申请号:US08947221

    申请日:1997-10-08

    IPC分类号: G06F17/00

    摘要: A system and method are provided for eliciting interesting structure from a collection of entities or resources with explicit and/or implicit, static and/or dynamic relations, called “affinities,” between them. Interesting structure includes (1) notions of quality, authority, or definitiveness of information, (2) notions of relevance to a user's information need, (3) notions of similarity among the plurality of resources retrieved from a universe of resources by a query process, and (4) notions of similarity among the usages of resources by different users/servers. Similarities between entities are computed, based on similarities between the affinity values for the entities. That is, where the affinitiy values for two entities resemble each other, the two entities have a high degree of similarity. Using the similarities, the entities are ranked, clustered, etc., based on a significance derived from the similarities. The ranking, clustering, etc., makes up the interesting structure which is sought.

    摘要翻译: 提供了一种系统和方法,用于从具有明确和/或隐含,静态和/或动态关系的实体或资源集合中引出有趣的结构,在它们之间称为“亲和度”。 有趣的结构包括(1)信息的质量,权威或定义的概念,(2)与用户信息需求相关的概念,(3)通过查询过程从资源范围检索的多个资源之间的相似度概念 ,(4)不同用户/服务器资源使用情况之间的相似性概念。 基于实体的亲和度值之间的相似度来计算实体之间的相似性。 也就是说,两个实体的亲属价值相似,两个实体的相似度很高。 使用相似之处,实体根据从相似性导出的意义进行排名,聚类等。 排名,聚类等构成了有趣的结构。

    System and method for hybrid hash join using over-partitioning to respond to database query
    16.
    发明授权
    System and method for hybrid hash join using over-partitioning to respond to database query 失效
    用于混合哈希连接的系统和方法使用超分区来响应数据库查询

    公开(公告)号:US06226639B1

    公开(公告)日:2001-05-01

    申请号:US09158741

    申请日:1998-09-22

    IPC分类号: G06F1730

    摘要: A system and method for joining a build table to a probe table in response to a query for data includes over partitioning the build table into “N” build partitions using a uniform hash function and writing the build partitions into main memory of a database computer. When the main memory becomes full, one or more partitions is selected as a victim partition to be written to disk storage, and the process continues until all build table rows or tuples have either been written into main memory or spilled to disk. Then, a packing algorithm is used to initially designate never-spilled partitions as “winners” and spilled partitions as “losers”, and then to randomly select one or more winners for prospective swapping with one or more losers. The I/O savings associated with each prospective swap is determined and if any savings would be realized, the winners are designated as losers the losers are designated as winners. The swap determination can be made multiple times, e.g., 256, after which losers are moved entirely to disk and winners are moved entirely to memory. At the end of the swapping, probe table rows associated with winner partitions are joined to rows in the winner build partitions while probe table rows associated with loser partitions are spilled to disk. Then, the loser build partitions are written to main memory for joining with corresponding probe table partitions, to undertake the requested join of the build table and probe table in an I/O- and memory-efficient manner.

    摘要翻译: 响应于数据查询将构建表连接到探测表的系统和方法包括使用统一散列函数将构建表过度分割为“N”构建分区,并将构建分区写入数据库计算机的主存储器。 当主内存变满时,将选择一个或多个分区作为要写入磁盘存储器的受害分区,并且该过程继续进行,直到所有构建表行或元组都已写入主内存或溢出到磁盘。 然后,打包算法用于初始地将未分配的分区指定为“获胜者”,将分区分散为“输家”,然后随机选择一个或多个获胜者进行与一个或多个输家的潜在交换。 确定与每个预期掉期相关的I / O节省,如果实现了任何节省,则获胜者被指定为失败者被指定为赢家的输家。 交换确定可以进行多次,例如256次,之后输家完全移动到磁盘,获胜者完全移动到内存。 在交换结束时,与优胜者分区关联的探测表行将连接到优胜者构建分区中的行,而与失败分区关联的探测表行会溢出到磁盘。 然后,失败者构建分区被写入主存储器以与相应的探测表分区相连接,以I / O和存储器高效的方式承载构建表和探测表的所请求的连接。

    Index partition maintenance over monotonically addressed document sequences
    18.
    发明授权
    Index partition maintenance over monotonically addressed document sequences 有权
    索引分区维护通过单调寻址的文档序列

    公开(公告)号:US08738673B2

    公开(公告)日:2014-05-27

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。

    Generating and using a dynamic bloom filter
    19.
    发明授权
    Generating and using a dynamic bloom filter 失效
    生成和使用动态布局过滤器

    公开(公告)号:US08209368B2

    公开(公告)日:2012-06-26

    申请号:US12134148

    申请日:2008-06-05

    IPC分类号: G06F17/10

    CPC分类号: G06F12/0864

    摘要: A dynamic Bloom filter comprises a cascaded set of Bloom filters. The system estimates or guesses a cardinality of input items, selects a number of hash functions based on the desired false positive rate, and allocates memory for an initial Bloom filter based on the estimated cardinality and desired false positive rate. The system inserts items into the initial Bloom filter and counts the bits set as they are inserted. If the number of bits set in the current Bloom filter reaches a predetermined target, the system declares the current Bloom filter full. The system recursively generates additional Bloom filters as needed for items remaining after the initial Bloom filter is filled; items are checked to eliminate duplicates. Each of the set of Bloom filters is individually queried to identify a positive or negative in response to a query. When the system is configured such that the false positive rate of each successive Bloom filter is decreased by one half, the system guarantees a false positive rate of at most twice the desired false positive rate.

    摘要翻译: 一个动态的Bloom过滤器包括一个级联的Bloom过滤器。 系统估计或猜测输入项的基数,基于所需的假阳性率选择多个散列函数,并且基于估计的基数和期望的假阳性率为初始布隆过滤器分配存储器。 系统将项目插入到初始布隆过滤器中,并对插入的位进行计数。 如果当前布隆过滤器中设置的位数达到预定目标,则系统将声明当前布隆过滤器已满。 系统会根据需要在初始布隆过滤器填充后剩余的项目递归地生成其他布隆过滤器; 检查项目以消除重复。 每一组Bloom过滤器都被单独查询以识别响应于查询的正或负。 当系统被配置为使得每个连续的Bloom过滤器的假阳性率减少一半时,系统保证假阳性率为期望假阳性率的两倍。

    INDEX PARTITION MAINTENANCE OVER MONOTONICALLY ADDRESSED DOCUMENT SEQUENCES
    20.
    发明申请
    INDEX PARTITION MAINTENANCE OVER MONOTONICALLY ADDRESSED DOCUMENT SEQUENCES 有权
    索引分割维护在单个寻址的文档序列中

    公开(公告)号:US20120059823A1

    公开(公告)日:2012-03-08

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。