Redistributing table data in a database cluster

    公开(公告)号:US11151111B2

    公开(公告)日:2021-10-19

    申请号:US15827660

    申请日:2017-11-30

    Abstract: A computer-implemented method of relocating data in a distributed database comprises: creating, by one or more processors, a second table in the distributed database, the second table including all columns from a first table; copying, by the one or more processors, a first set of tuples from the first table to the second table; modifying, by the one or more processors, during the copying of the first set of tuples, data of the first table according to a modification; after the copying of the first set of tuples, modifying, by the one or more processors, data of the second table according to the modification; and switching, by the one or more processors, the second table for the first table in a catalog of the distributed database.

    REDISTRIBUTING TABLE DATA IN A DATABASE CLUSTER

    公开(公告)号:US20190163773A1

    公开(公告)日:2019-05-30

    申请号:US15827660

    申请日:2017-11-30

    Abstract: A computer-implemented method of relocating data in a distributed database comprises: creating, by one or more processors, a second table in the distributed database, the second table including all columns from a first table; copying, by the one or more processors, a first set of tuples from the first table to the second table; modifying, by the one or more processors, during the copying of the first set of tuples, data of the first table according to a modification; after the copying of the first set of tuples, modifying, by the one or more processors, data of the second table according to the modification; and switching, by the one or more processors, the second table for the first table in a catalog of the distributed database.

    Hybrid data distribution in a massively parallel processing architecture

    公开(公告)号:US10303654B2

    公开(公告)日:2019-05-28

    申请号:US14629107

    申请日:2015-02-23

    Abstract: System and method for hybrid distribution mode in massively parallel processing (MPP) database preventing storage imbalance issues caused by data skew. Key values of the database are identified as outliers if records of those keys cause database skew. In hybrid mode, records having the outlier key values are distributed using a random distribution scheme. Other records are distributed using a hash distribution scheme. A threshold skew amount is configurable for the system. Record lookups, insertions, deletions, and updates are processed according to a query plan optimized for the distribution mode of the records referenced in a database query.

    DATA PLACEMENT CONTROL FOR DISTRIBUTED COMPUTING ENVIRONMENT
    6.
    发明申请
    DATA PLACEMENT CONTROL FOR DISTRIBUTED COMPUTING ENVIRONMENT 审中-公开
    用于分布式计算环境的数据放置控制

    公开(公告)号:US20170031988A1

    公开(公告)日:2017-02-02

    申请号:US14813668

    申请日:2015-07-30

    CPC classification number: G06F17/30466 G06F17/30486

    Abstract: A method includes dividing a dataset into partitions by hashing a specified key, selecting a set of distributed file system nodes as a primary node group for storage of the partitions, and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.

    Abstract translation: 一种方法包括通过散列指定的密钥将数据集分成分区,选择一组分布式文件系统节点作为主节点组以存储分区,并使分区的主副本存储在主节点组上, 分布式存储系统文件服务器,使得每个分区的位置通过散列指定的密钥而已知。

    Dynamic computation node grouping with cost based optimization for massively parallel processing

    公开(公告)号:US10649996B2

    公开(公告)日:2020-05-12

    申请号:US15374158

    申请日:2016-12-09

    Abstract: A massively parallel processing shared nothing relational database management system includes a plurality of storages assigned to a plurality of compute nodes. The system comprises a non-transitory memory having instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to store a set of data in a first set of storages in the plurality of storages. The first set of data is hashed into a repartitioned set of data. The first set of storages is reassigned to a second set of compute nodes in the plurality of compute nodes. The repartitioned set of data is distributed to the second set of compute nodes and a database operation is performed on the repartitioned set of data by the second set of compute nodes.

    Adaptive code generation with a cost model for JIT compiled execution in a database system

    公开(公告)号:US09934051B1

    公开(公告)日:2018-04-03

    申请号:US15489568

    申请日:2017-04-17

    CPC classification number: G06F9/4552 G06F8/443 G06F17/30474

    Abstract: The disclosure relates to technology for query compilation in a database management system. A first execution time of code for at least one database query without applying a code generation method is estimated and in response to receiving the at least one database query, and for one or more code generation methods, a compilation cost and a second execution time of the code as modified by the code generation methods is estimated. A cost savings for each of the one or more code generation methods is calculated, where the cost savings is calculated as the first execution time less the second execution time of the code generation method, less the compilation cost of the code generation method. One of the code generation methods or the no code generation method with the highest cost savings is then selected.

Patent Agency Ranking