Techniques for data assignment from an external distributed file system to a database management system
    1.
    发明授权
    Techniques for data assignment from an external distributed file system to a database management system 有权
    从外部分布式文件系统到数据库管理系统的数据分配技术

    公开(公告)号:US08713057B2

    公开(公告)日:2014-04-29

    申请号:US13340335

    申请日:2011-12-29

    IPC分类号: G06F17/30

    摘要: Techniques for data assignment from an external distributed file system (DFS) to a database management system (DBMS) are provided. Data blocks from the DFS are represented as first nodes and access module processors of the DBMS are represented as second nodes. A graph is produced with the first and second nodes. Assignments are made for the first nodes to the second nodes based on evaluation of the graph to integrate the DFS with the DBMS.

    摘要翻译: 提供了从外部分布式文件系统(DFS)到数据库管理系统(DBMS)的数据分配技术。 来自DFS的数据块被表示为第一节点,并且DBMS的访问模块处理器被表示为第二节点。 使用第一和第二节点生成图形。 基于图的评估将第一个节点分配给第二个节点,以将DFS与DBMS集成。

    TECHNIQUES FOR ACCESSING A PARALLEL DATABASE SYSTEM VIA EXTERNAL PROGRAMS USING VERTICAL AND/OR HORIZONTAL PARTITIONING
    2.
    发明申请
    TECHNIQUES FOR ACCESSING A PARALLEL DATABASE SYSTEM VIA EXTERNAL PROGRAMS USING VERTICAL AND/OR HORIZONTAL PARTITIONING 有权
    通过垂直和/或水平分割的外部程序访问并行数据库系统的技术

    公开(公告)号:US20130173594A1

    公开(公告)日:2013-07-04

    申请号:US13340324

    申请日:2011-12-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30424 G06F17/30584

    摘要: Techniques for accessing a parallel database system via an external program using vertical and/or horizontal partitioning are provided. An external program to a database management system (DBMS) configures external mappers to process a specific portion of query results on specific access module processors of the DBMS that are to house query results. The query is submitted by the external program to the DBMS and the DBMS is directed to organize the query results in a vertical or horizontal manner. Each external mapper accesses its portion of the query results for processing in parallel on its designated AMP or set of AMPS to process the query results.

    摘要翻译: 提供了通过使用垂直和/或水平分区的外部程序访问并行数据库系统的技术。 数据库管理系统(DBMS)的外部程序配置外部映射器,以处理要存放查询结果的DBMS的特定访问模块处理器上的查询结果的特定部分。 该查询由外部程序提交给DBMS,并且指导DBMS以垂直或水平方式组织查询结果。 每个外部映射器访问其部分查询结果,以便在其指定的AMP或一组AMPS上并行处理以处理查询结果。

    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR ELIMINATING UNNECESSARY SELF-JOINS IN A DATABASE SYSTEM
    3.
    发明申请
    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR ELIMINATING UNNECESSARY SELF-JOINS IN A DATABASE SYSTEM 审中-公开
    用于消除数据库系统中不必要的自我接收的系统,方法和计算机可读介质

    公开(公告)号:US20100121836A1

    公开(公告)日:2010-05-13

    申请号:US12268491

    申请日:2008-11-11

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F16/24544

    摘要: A system, method, and computer-readable medium for optimizing query performance in a database system are provided. In one embodiment, join predicates of a self outer join are evaluated. If each join predicate is respectively based on a common join attribute, and each join attribute has a not null constraint applied thereto, the self outer join may be re-written as a self inner join. In another embodiment, if not null and unique constraints are applied to each join attribute of an inner join featuring join predicates each respectively based on a common join attribute, the inner join may advantageously removed thereby resulting in a select operation.

    摘要翻译: 提供了一种用于优化数据库系统中的查询性能的系统,方法和计算机可读介质。 在一个实施例中,评估自外部连接的连接谓词。 如果每个连接谓词分别基于公共连接属性,并且每个连接属性具有应用于其的非空约束,则可以将自外部连接重写为自内部连接。 在另一个实施例中,如果不是空且唯一的约束被分别应用于基于共同连接属性的具有连接谓词的内部连接的每个连接属性,则可以有利地移除内部连接,从而导致选择操作。

    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR REDUCING ROW REDISTRIBUTION COSTS FOR PARALLEL JOIN OPERATIONS
    4.
    发明申请
    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR REDUCING ROW REDISTRIBUTION COSTS FOR PARALLEL JOIN OPERATIONS 有权
    系统,方法和计算机可读介质,用于减少并行运行的重新分配成本

    公开(公告)号:US20100049722A1

    公开(公告)日:2010-02-25

    申请号:US12193814

    申请日:2008-08-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30466 G06F17/30445

    摘要: A system, method, and computer-readable medium for optimizing execution of a join operation in a parallel processing system are provided. A plurality of processing nodes that have at least one row of one or more tables involved in a join operation are identified. For each of the processing nodes, respective counts of rows that would be redistributed to each of the processing nodes based on join attributes of the rows are determined. A redistribution matrix is calculated from the counts of rows of each of the processing nodes. An optimized redistribution matrix is generated from the redistribution matrix, wherein the optimized redistribution matrix provides a minimization of rows to be redistributed among the nodes to execute the join operation.

    摘要翻译: 提供了一种用于优化并行处理系统中的连接操作的执行的系统,方法和计算机可读介质。 识别在连接操作中具有至少一行一个或多个表的多个处理节点。 对于每个处理节点,确定将基于行的连接属性重新分配给每个处理节点的行的相应计数。 从每个处理节点的行数计算再分配矩阵。 从再分配矩阵生成优化的再分配矩阵,其中优化的再分配矩阵提供在节点之间重新分布的行的最小化以执行连接操作。

    TECHNIQUES FOR DATA ASSIGNMENT FROM AN EXTERNAL DISTRIBUTED FILE SYSTEM TO A DATABASE MANAGEMENT SYSTEM
    5.
    发明申请
    TECHNIQUES FOR DATA ASSIGNMENT FROM AN EXTERNAL DISTRIBUTED FILE SYSTEM TO A DATABASE MANAGEMENT SYSTEM 有权
    从外部分布式文件系统到数据库管理系统的数据分配技术

    公开(公告)号:US20130173666A1

    公开(公告)日:2013-07-04

    申请号:US13340335

    申请日:2011-12-29

    IPC分类号: G06F17/30

    摘要: Techniques for data assignment from an external distributed file system (DFS) to a database management system (DBMS) are provided. Data blocks from the DFS are represented as first nodes and access module processors of the DBMS are represented as second nodes. A graph is produced with the first and second nodes. Assignments are made for the first nodes to the second nodes based on evaluation of the graph to integrate the DFS with the DBMS.

    摘要翻译: 提供了从外部分布式文件系统(DFS)到数据库管理系统(DBMS)的数据分配技术。 来自DFS的数据块被表示为第一节点,并且DBMS的访问模块处理器被表示为第二节点。 使用第一和第二节点生成图形。 基于图的评估将第一个节点分配给第二个节点,以将DFS与DBMS集成。

    System, method, and computer-readable medium for optimizing processing of distinct and aggregation queries on skewed data in a database system
    6.
    发明授权
    System, method, and computer-readable medium for optimizing processing of distinct and aggregation queries on skewed data in a database system 有权
    系统,方法和计算机可读介质,用于优化对数据库系统中偏斜数据的不同和聚合查询的处理

    公开(公告)号:US08234268B2

    公开(公告)日:2012-07-31

    申请号:US12277343

    申请日:2008-11-25

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30445

    摘要: A system, method, and computer-readable medium for optimization of query processing in a parallel processing system are provided. Skewed values and non-skewed values are treated differently to improve upon conventional DISTINCT and aggregation query processing. Skewed attribute values on which a DISTINCT selection or group by aggregation is applied are allocated entries in a hash table. In this manner, a processing module may consult the hash table to determine if a skewed attribute value has been encountered during the query processing in a manner that precludes repetitive redistribution of rows with highly skewed attribute values on which a DISTINCT selection or group by aggregation is applied.

    摘要翻译: 提供了一种用于在并行处理系统中优化查询处理的系统,方法和计算机可读介质。 对偏差值和非偏斜值进行不同的处理,以改进传统的DISTINCT和聚合查询处理。 应用DISTINCT选择或聚合组的偏斜属性值在散列表中分配条目。 以这种方式,处理模块可以参考哈希表以确定在查询处理期间是否已经遇到偏斜的属性值,以排除重复重新分配具有高度偏斜的属性值的行,其中通过聚合的DISTINCT选择或组 应用。

    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR DYNAMIC DETECTION AND MANAGEMENT OF DATA SKEW IN PARALLEL JOIN OPERATIONS
    7.
    发明申请
    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR DYNAMIC DETECTION AND MANAGEMENT OF DATA SKEW IN PARALLEL JOIN OPERATIONS 有权
    系统,方法和计算机可读介质用于动态检测和并行管理数据并行操作

    公开(公告)号:US20100332458A1

    公开(公告)日:2010-12-30

    申请号:US12494366

    申请日:2009-06-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30498

    摘要: A system, method, and computer-readable medium for dynamic detection and management of data skew in parallel join operations are provided. Rows allocated to processing modules involved in a join operation are redistributed among the processing modules by a hash redistribution of the join attributes. Receipt by a processing module of an excessive number of redistributed rows having a skewed value on the join attribute is detected by a processing module which notifies other processing modules of the skewed value. Processing modules then terminate redistribution of rows having a join attribute value matching the skewed value and either store such rows locally or duplicate the rows. The processing module that has received an excessive number of redistributed rows removes rows having a skewed value of the join attribute from a redistribution spool allocated thereto and duplicates the rows to each of the processing modules. The join operation is completed by performing a local join at each processing module and merging the results of the local join operations.

    摘要翻译: 提供了一种用于并行连接操作中的数据偏移的动态检测和管理的系统,方法和计算机可读介质。 通过连接属性的哈希再分配,分配给处理模块中涉及的连接操作的行在处理模块之间重新分配。 由处理模块对连接属性具有偏斜值的过多重新分配行的处理模块进行检测,该处理模块向其他处理模块通知偏斜值。 然后,处理模块终止具有与偏斜值匹配的连接属性值的行的重新分配,并且在本地存储这些行或复制行。 已经接收到过多数量的再分配行的处理模块从分配给其的再分配假脱机移除具有连接属性的偏斜值的行,并将行复制到每个处理模块。 通过在每个处理模块执行本地连接并合并本地连接操作的结果来完成连接操作。

    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR OPTIMIZING THE PERFORMANCE OF OUTER JOINS
    8.
    发明申请
    SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR OPTIMIZING THE PERFORMANCE OF OUTER JOINS 有权
    系统,方法和计算机可读介质,用于优化外部接口的性能

    公开(公告)号:US20100082600A1

    公开(公告)日:2010-04-01

    申请号:US12235652

    申请日:2008-09-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: A system, method, and computer-readable medium for optimizing the performance of outer joins in a parallel processing system are provided. Predicates involving only attributes of a left table of a left outer join are pushed down to the outer relation for left outer joins having join predicates involving left table attributes and/or predicates involving attributes of both the right and left table. In such an instance, the rows of the left table may be partitioned into two sub-relations according to the predicate involving only attributes of the left table. Rows of the left table are allocated to a first sub-relation if the rows satisfy the predicate involving only attributes of the left table and rows of the left table are allocated to a second sub-relation if the rows fail to satisfy the predicate involving only attributes of the left table. Accordingly, only rows of the first sub-relation are required to be left outer joined with the right table. Advantageously, a reduction in the requisite number of rows to be redistributed and joined is facilitated. The disclosed embodiments may be similarly applied for optimization of right outer joins. Further, embodiments for optimizing full outer joins are disclosed.

    摘要翻译: 提供了一种用于在并行处理系统中优化外连接的性能的系统,方法和计算机可读介质。 仅涉及左外连接的左表的属性的谓词被下推到具有涉及左表属性和/或涉及右表和左表的属性的谓词的连接谓词的左外连接的外关系。 在这种情况下,根据只涉及左表属性的谓词,左表的行可以被划分成两个子关系。 如果行满足仅涉及左表的属性的谓词,则左表的行被分配给第一个子关系,并且如果行不能满足仅涉及的谓词,则左表的行被分配给第二子关系 左表的属性。 因此,仅需要将第一子关系的行与右桌左侧外连接。 有利地,促进要重新分配和连接的所需行数的减少。 所公开的实施例可以类似地应用于右外连接的优化。 此外,公开了用于优化全外连接的实施例。

    Techniques for external application-directed data partitioning in data exporting from a database management system
    9.
    发明授权
    Techniques for external application-directed data partitioning in data exporting from a database management system 有权
    从数据库管理系统导出的数据中的外部应用程序定向数据分区的技术

    公开(公告)号:US08938444B2

    公开(公告)日:2015-01-20

    申请号:US13340357

    申请日:2011-12-29

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30584

    摘要: Techniques for external application-directed data partitioning in data exported from a parallel database management system (DBMS) are provided. An external application sends a query, a total number of requested access module processors (AMPs), and an application-defined data partitioning expression to the DBMS. The DBMS executes the query with the results vertical partitioned on the identified number of AMPs. Individual external mappers access their assigned AMPs asking for specific partitions that they are assigned to process the query results.

    摘要翻译: 提供了从并行数据库管理系统(DBMS)导出的数据中外部应用程序定向数据分区的技术。 外部应用程序将查询,请求的访问模块处理器(AMP)的总数以及应用程序定义的数据分区表达式发送到DBMS。 DBMS执行查询,并将结果垂直划分在标识的AMP数量上。 单独的外部映射器访问其分配的AMP,要求分配它们的特定分区来处理查询结果。

    System, method, and computer-readable medium for optimizing processing of queries featuring maximum or minimum equality conditions in a parallel processing system
    10.
    发明授权
    System, method, and computer-readable medium for optimizing processing of queries featuring maximum or minimum equality conditions in a parallel processing system 有权
    用于在并行处理系统中优化具有最大或最小相等条件的查询处理的系统,方法和计算机可读介质

    公开(公告)号:US08234292B2

    公开(公告)日:2012-07-31

    申请号:US12332602

    申请日:2008-12-11

    IPC分类号: G06F17/30

    摘要: A system, method, and computer-readable medium for optimized processing of queries that feature maximum or minimum equality conditions are provided. A table on which the query is applied is scanned a single time. Rows of the table distributed to respective processing modules are scanned by the processing modules. Each processing module maintains identification of any rows distributed to the respective processing module that have attribute values that equal the maximum or minimum attribute value locally identified by the processing module. Subsequently, a global aggregation mechanism is invoked to compute the query result without requiring an additional rescan of the table. Further, the disclosed mechanisms may be extended to compute top N queries featuring maximum or minimum equality conditions.

    摘要翻译: 提供了一种用于优化处理具有最大或最小相等条件的查询的系统,方法和计算机可读介质。 一次扫描查询应用的表。 分配给各个处理模块的表的行由处理模块扫描。 每个处理模块维护分配给相应处理模块的任何行的识别,其具有等于由处理模块本地标识的最大或最小属性值的属性值。 随后,调用全局聚合机制来计算查询结果,而不需要对表进行额外的重新扫描。 此外,所公开的机制可以被扩展到计算具有最大或最小相等条件的前N个查询。