摘要:
Techniques for data assignment from an external distributed file system (DFS) to a database management system (DBMS) are provided. Data blocks from the DFS are represented as first nodes and access module processors of the DBMS are represented as second nodes. A graph is produced with the first and second nodes. Assignments are made for the first nodes to the second nodes based on evaluation of the graph to integrate the DFS with the DBMS.
摘要:
Techniques for accessing a parallel database system via an external program using vertical and/or horizontal partitioning are provided. An external program to a database management system (DBMS) configures external mappers to process a specific portion of query results on specific access module processors of the DBMS that are to house query results. The query is submitted by the external program to the DBMS and the DBMS is directed to organize the query results in a vertical or horizontal manner. Each external mapper accesses its portion of the query results for processing in parallel on its designated AMP or set of AMPS to process the query results.
摘要:
A system, method, and computer-readable medium for optimizing query performance in a database system are provided. In one embodiment, join predicates of a self outer join are evaluated. If each join predicate is respectively based on a common join attribute, and each join attribute has a not null constraint applied thereto, the self outer join may be re-written as a self inner join. In another embodiment, if not null and unique constraints are applied to each join attribute of an inner join featuring join predicates each respectively based on a common join attribute, the inner join may advantageously removed thereby resulting in a select operation.
摘要:
A system, method, and computer-readable medium for optimizing execution of a join operation in a parallel processing system are provided. A plurality of processing nodes that have at least one row of one or more tables involved in a join operation are identified. For each of the processing nodes, respective counts of rows that would be redistributed to each of the processing nodes based on join attributes of the rows are determined. A redistribution matrix is calculated from the counts of rows of each of the processing nodes. An optimized redistribution matrix is generated from the redistribution matrix, wherein the optimized redistribution matrix provides a minimization of rows to be redistributed among the nodes to execute the join operation.
摘要:
Techniques for data assignment from an external distributed file system (DFS) to a database management system (DBMS) are provided. Data blocks from the DFS are represented as first nodes and access module processors of the DBMS are represented as second nodes. A graph is produced with the first and second nodes. Assignments are made for the first nodes to the second nodes based on evaluation of the graph to integrate the DFS with the DBMS.
摘要:
A system, method, and computer-readable medium for optimization of query processing in a parallel processing system are provided. Skewed values and non-skewed values are treated differently to improve upon conventional DISTINCT and aggregation query processing. Skewed attribute values on which a DISTINCT selection or group by aggregation is applied are allocated entries in a hash table. In this manner, a processing module may consult the hash table to determine if a skewed attribute value has been encountered during the query processing in a manner that precludes repetitive redistribution of rows with highly skewed attribute values on which a DISTINCT selection or group by aggregation is applied.
摘要:
A system, method, and computer-readable medium for dynamic detection and management of data skew in parallel join operations are provided. Rows allocated to processing modules involved in a join operation are redistributed among the processing modules by a hash redistribution of the join attributes. Receipt by a processing module of an excessive number of redistributed rows having a skewed value on the join attribute is detected by a processing module which notifies other processing modules of the skewed value. Processing modules then terminate redistribution of rows having a join attribute value matching the skewed value and either store such rows locally or duplicate the rows. The processing module that has received an excessive number of redistributed rows removes rows having a skewed value of the join attribute from a redistribution spool allocated thereto and duplicates the rows to each of the processing modules. The join operation is completed by performing a local join at each processing module and merging the results of the local join operations.
摘要:
A system, method, and computer-readable medium for optimizing the performance of outer joins in a parallel processing system are provided. Predicates involving only attributes of a left table of a left outer join are pushed down to the outer relation for left outer joins having join predicates involving left table attributes and/or predicates involving attributes of both the right and left table. In such an instance, the rows of the left table may be partitioned into two sub-relations according to the predicate involving only attributes of the left table. Rows of the left table are allocated to a first sub-relation if the rows satisfy the predicate involving only attributes of the left table and rows of the left table are allocated to a second sub-relation if the rows fail to satisfy the predicate involving only attributes of the left table. Accordingly, only rows of the first sub-relation are required to be left outer joined with the right table. Advantageously, a reduction in the requisite number of rows to be redistributed and joined is facilitated. The disclosed embodiments may be similarly applied for optimization of right outer joins. Further, embodiments for optimizing full outer joins are disclosed.
摘要:
Techniques for external application-directed data partitioning in data exported from a parallel database management system (DBMS) are provided. An external application sends a query, a total number of requested access module processors (AMPs), and an application-defined data partitioning expression to the DBMS. The DBMS executes the query with the results vertical partitioned on the identified number of AMPs. Individual external mappers access their assigned AMPs asking for specific partitions that they are assigned to process the query results.
摘要:
A system, method, and computer-readable medium for optimized processing of queries that feature maximum or minimum equality conditions are provided. A table on which the query is applied is scanned a single time. Rows of the table distributed to respective processing modules are scanned by the processing modules. Each processing module maintains identification of any rows distributed to the respective processing module that have attribute values that equal the maximum or minimum attribute value locally identified by the processing module. Subsequently, a global aggregation mechanism is invoked to compute the query result without requiring an additional rescan of the table. Further, the disclosed mechanisms may be extended to compute top N queries featuring maximum or minimum equality conditions.