Abstract:
A computer-implemented method of relocating data in a distributed database comprises: creating, by one or more processors, a second table in the distributed database, the second table including all columns from a first table; copying, by the one or more processors, a first set of tuples from the first table to the second table; modifying, by the one or more processors, during the copying of the first set of tuples, data of the first table according to a modification; after the copying of the first set of tuples, modifying, by the one or more processors, data of the second table according to the modification; and switching, by the one or more processors, the second table for the first table in a catalog of the distributed database.
Abstract:
A computer-implemented method of relocating data in a distributed database comprises: creating, by one or more processors, a second table in the distributed database, the second table including all columns from a first table; copying, by the one or more processors, a first set of tuples from the first table to the second table; modifying, by the one or more processors, during the copying of the first set of tuples, data of the first table according to a modification; after the copying of the first set of tuples, modifying, by the one or more processors, data of the second table according to the modification; and switching, by the one or more processors, the second table for the first table in a catalog of the distributed database.
Abstract:
A massively parallel processing shared nothing relational database management system includes a plurality of storages assigned to a plurality of compute nodes. The system comprises a non-transitory memory having instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to store a set of data in a first set of storages in the plurality of storages. The first set of data is hashed into a repartitioned set of data. The first set of storages is reassigned to a second set of compute nodes in the plurality of compute nodes. The repartitioned set of data is distributed to the second set of compute nodes and a database operation is performed on the repartitioned set of data by the second set of compute nodes.
Abstract:
System and method for hybrid distribution mode in massively parallel processing (MPP) database preventing storage imbalance issues caused by data skew. Key values of the database are identified as outliers if records of those keys cause database skew. In hybrid mode, records having the outlier key values are distributed using a random distribution scheme. Other records are distributed using a hash distribution scheme. A threshold skew amount is configurable for the system. Record lookups, insertions, deletions, and updates are processed according to a query plan optimized for the distribution mode of the records referenced in a database query.
Abstract:
A method includes dividing a dataset into partitions by hashing a specified key, selecting a set of distributed file system nodes as a primary node group for storage of the partitions, and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.
Abstract:
A method includes dividing a dataset into partitions by hashing a specified key, selecting a set of distributed file system nodes as a primary node group for storage of the partitions, and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.
Abstract:
A massively parallel processing shared nothing relational database management system includes a plurality of storages assigned to a plurality of compute nodes. The system comprises a non-transitory memory having instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to store a set of data in a first set of storages in the plurality of storages. The first set of data is hashed into a repartitioned set of data. The first set of storages is reassigned to a second set of compute nodes in the plurality of compute nodes. The repartitioned set of data is distributed to the second set of compute nodes and a database operation is performed on the repartitioned set of data by the second set of compute nodes.
Abstract:
A system and method for improved online transaction processing (OLTP) in a sharded database is provided. Overhead associated with a global transaction manager is reduced and scalability improved by determining whether incoming queries are single-shard transactions or multi-shard transactions. For multi-shard transactions, a distributed transaction ID (DXID) is requested from the GTM, and then forwarded with the query to one or more data notes. For single-shard transactions, the query is sent to a data node without requesting a DXID from the GTM.
Abstract:
The disclosure relates to technology for query compilation in a database management system. A first execution time of code for at least one database query without applying a code generation method is estimated and in response to receiving the at least one database query, and for one or more code generation methods, a compilation cost and a second execution time of the code as modified by the code generation methods is estimated. A cost savings for each of the one or more code generation methods is calculated, where the cost savings is calculated as the first execution time less the second execution time of the code generation method, less the compilation cost of the code generation method. One of the code generation methods or the no code generation method with the highest cost savings is then selected.
Abstract:
A system and method for improved online transaction processing (OLTP) in a sharded database is provided. Overhead associated with a global transaction manager is reduced and scalability improved by determining whether incoming queries are single-shard transactions or multi-shard transactions. For multi-shard transactions, a distributed transaction ID (DXID) is requested from the GTM, and then forwarded with the query to one or more data notes. For single-shard transactions, the query is sent to a data node without requesting a DXID from the GTM.