Hybrid data distribution in a massively parallel processing architecture

    公开(公告)号:US10303654B2

    公开(公告)日:2019-05-28

    申请号:US14629107

    申请日:2015-02-23

    Abstract: System and method for hybrid distribution mode in massively parallel processing (MPP) database preventing storage imbalance issues caused by data skew. Key values of the database are identified as outliers if records of those keys cause database skew. In hybrid mode, records having the outlier key values are distributed using a random distribution scheme. Other records are distributed using a hash distribution scheme. A threshold skew amount is configurable for the system. Record lookups, insertions, deletions, and updates are processed according to a query plan optimized for the distribution mode of the records referenced in a database query.

    BEST-EFFORTS DATABASE FUNCTIONS
    23.
    发明申请

    公开(公告)号:US20180203895A1

    公开(公告)日:2018-07-19

    申请号:US15408130

    申请日:2017-01-17

    CPC classification number: G06F16/24524

    Abstract: A computer-implemented method and system at a network switch provides using one or more processors to perform a pre-defined database function on query data contained in data messages received at the network switch, with the performing producing result data, and wherein the pre-defined database function is performed on the query data in a first mode of operation to a state of full completion, generating complete result data and no skipped query data, and in a second mode of operation to a state of partial completion, generating partially complete result data and skipped query data. Further, the method and system performing one or more network switch functions to route the complete result data, and/or route the partially complete result data and skipped query data, to one or more destination nodes. In addition, an application programming interface (API) is used to define the database function.

    System and method for massively parallel processor database

    公开(公告)号:US09959332B2

    公开(公告)日:2018-05-01

    申请号:US14601679

    申请日:2015-01-21

    CPC classification number: G06F17/30575

    Abstract: In one embodiment, a method includes determining a number of initial servers in a massively parallel processing (MPP) database cluster and determining an initial bucket configuration of the MPP database cluster, where the initial bucket configuration has a number of initial buckets. The method also includes adding a number of additional servers to the MPP database cluster to produce a number of updated servers, where the updated servers include the initial servers and the additional servers and creating an updated bucket configuration in accordance with the number of initial servers, the initial bucket configuration, and the number of additional servers, where the updated bucket configuration has a number of updated buckets. Additionally, the method includes redistributing data of the MPP cluster in accordance with the updated bucket configuration.

    Apparatus and Method for Managing Storage of a Primary Database and a Replica Database

    公开(公告)号:US20170097972A1

    公开(公告)日:2017-04-06

    申请号:US14872811

    申请日:2015-10-01

    CPC classification number: G06F16/27 G06F16/258

    Abstract: System and method embodiments are provided for using different storage formats for a primary database and its replicas in a database managed replication (DMR) system. As such, the advantages of both formats can be combined with suitable design complexity and implementation. In an embodiment, data is arranged in a sequence of rows and stored in a first storage format at the primary database. The data arranged in the sequence of rows is also stored in a second storage format at the replica database. The sequence of rows is determined according to the first storage format or the second storage format. The first storage format is a row store (RS) and the second storage format is a column store (CS), or vice versa. In an embodiment, the sequence of rows is determined to improve compression efficiency at the CS.

    System and method for adaptive vector size selection for vectorized query execution
    26.
    发明授权
    System and method for adaptive vector size selection for vectorized query execution 有权
    用于向量化查询执行的自适应向量大小选择的系统和方法

    公开(公告)号:US09436732B2

    公开(公告)日:2016-09-06

    申请号:US13798680

    申请日:2013-03-13

    CPC classification number: G06F17/30463

    Abstract: System and method embodiments are provided for adaptive vector size selection for vectorized query execution. The adaptive vector size selection is implemented in two stages. In a query planning stage, a suitable vector size is estimated for a query by a query planner. The planning stage includes analyzing a query plan tree, segmenting the tree into different segments, and assigning to the query execution plan an initial vector size to each segment. In a subsequent query execution stage, an execution engine monitors hardware performance indicators, and adjusts the vector size according to the monitored hardware performance indicators. Adjusting the vector size includes trying different vector sizes and observing related processor counters to increase or decrease the vector size, wherein the vector size is increased to improve hardware performance according to the processor counters, and wherein the vector size is decreased when the processor counters indicate a decrease in hardware performance.

    Abstract translation: 为向量化查询执行的自适应向量大小选择提供了系统和方法实施例。 自适应向量大小选择分两个阶段实现。 在查询计划阶段,由查询计划器为查询估计合适的向量大小。 规划阶段包括分析查询计划树,将树分割成不同的段,并将查询执行计划分配给每个段的初始向量大小。 在随后的查询执行阶段,执行引擎监视硬件性能指标,并根据监视的硬件性能指标调整向量大小。 调整矢量大小包括尝试不同的矢量大小并观察相关处理器计数器以增加或减少矢量大小,其中根据处理器计数器增加矢量大小以提高硬件性能,并且其中当处理器计数器指示 硬件性能下降。

    System and Method for Massively Parallel Processor Database
    27.
    发明申请
    System and Method for Massively Parallel Processor Database 有权
    大规模并行处理器数据库的系统和方法

    公开(公告)号:US20160210340A1

    公开(公告)日:2016-07-21

    申请号:US14601679

    申请日:2015-01-21

    CPC classification number: G06F17/30575

    Abstract: In one embodiment, a method includes determining a number of initial servers in a massively parallel processing (MPP) database cluster and determining an initial bucket configuration of the MPP database cluster, where the initial bucket configuration has a number of initial buckets. The method also includes adding a number of additional servers to the MPP database cluster to produce a number of updated servers, where the updated servers include the initial servers and the additional servers and creating an updated bucket configuration in accordance with the number of initial servers, the initial bucket configuration, and the number of additional servers, where the updated bucket configuration has a number of updated buckets. Additionally, the method includes redistributing data of the MPP cluster in accordance with the updated bucket configuration.

    Abstract translation: 在一个实施例中,一种方法包括确定大规模并行处理(MPP)数据库集群中的初始服务器的数量并确定MPP数据库集群的初始桶配置,其中初始桶配置具有多个初始桶。 该方法还包括将多个附加服务器添加到MPP数据库集群以产生多个更新的服务器,其中更新的服务器包括初始服务器和附加服务器,并且根据初始服务器的数量创建更新的桶配置, 初始桶配置以及更新的桶配置具有多个更新桶的附加服务器数量。 此外,该方法包括根据更新的桶配置重新分配MPP簇的数据。

    System and method for massively parallel processing database
    28.
    发明授权
    System and method for massively parallel processing database 有权
    大规模并行处理数据库的系统和方法

    公开(公告)号:US09348865B2

    公开(公告)日:2016-05-24

    申请号:US14243461

    申请日:2014-04-02

    CPC classification number: G06F17/30445 G06F17/30477

    Abstract: In one embodiment, a method for managing database resources includes selecting a first query from a queue of queries and transmitting, by a global resource manager to a portion of a plurality of data nodes, a plurality of reserve resource messages. The method also includes receiving, by the global resource manager from the portion of the plurality of data nodes, a plurality of acknowledgement messages and transmitting, by the global resource manager to a coordinator node, an execute query message when the plurality of acknowledgement messages are positive acknowledgements.

    Abstract translation: 在一个实施例中,用于管理数据库资源的方法包括从查询队列中选择第一查询,并由全球资源管理器向多个数据节点的一部分发送多个预留资源消息。 所述方法还包括由全球资源管理器从多个数据节点的部分接收多个确认消息,并且由全球资源管理器向协调器节点发送执行查询消息,当多个确认消息是 肯定的确认。

    System and Method for Massively Parallel Processing Database
    29.
    发明申请
    System and Method for Massively Parallel Processing Database 有权
    大规模并行处理数据库的系统与方法

    公开(公告)号:US20150286678A1

    公开(公告)日:2015-10-08

    申请号:US14243461

    申请日:2014-04-02

    CPC classification number: G06F17/30445 G06F17/30477

    Abstract: In one embodiment, a method for managing database resources includes selecting a first query from a queue of queries and transmitting, by a global resource manager to a portion of a plurality of data nodes, a plurality of reserve resource messages. The method also includes receiving, by the global resource manager from the portion of the plurality of data nodes, a plurality of acknowledgement messages and transmitting, by the global resource manager to a coordinator node, an execute query message when the plurality of acknowledgement messages are positive acknowledgements.

    Abstract translation: 在一个实施例中,用于管理数据库资源的方法包括从查询队列中选择第一查询,并由全球资源管理器向多个数据节点的一部分发送多个预留资源消息。 所述方法还包括由全球资源管理器从多个数据节点的部分接收多个确认消息,并且由全球资源管理器向协调器节点发送执行查询消息,当多个确认消息是 肯定的确认。

    Method for Two-Stage Query Optimization in Massively Parallel Processing Database Clusters
    30.
    发明申请
    Method for Two-Stage Query Optimization in Massively Parallel Processing Database Clusters 有权
    大规模并行处理数据库集群中两阶段查询优化的方法

    公开(公告)号:US20140188841A1

    公开(公告)日:2014-07-03

    申请号:US13730872

    申请日:2012-12-29

    CPC classification number: G06F17/30445 G06F17/30483

    Abstract: Queries may be processed more efficiently in an massively parallel processing (MPP) database by locally optimizing the global execution plan. The global execution plan and a semantic tree may be provided to MPP data nodes by an MPP coordinator. The MPP data nodes may then use the global execution plan and the semantic tree to generate a local execution plan. Thereafter, the MPP data nodes may select either the global execution plan or the local execution plan is accordance with a cost evaluation.

    Abstract translation: 在大规模并行处理(MPP)数据库中,可以通过局部优化全局执行计划,更有效地处理查询。 全局执行计划和语义树可以由MPP协调器提供给MPP数据节点。 然后,MPP数据节点可以使用全局执行计划和语义树来生成本地执行计划。 此后,MPP数据节点可以选择全局执行计划,或者本地执行计划根据成本评估。

Patent Agency Ranking