Partition aware evaluation of top-N queries

    公开(公告)号:US10706055B2

    公开(公告)日:2020-07-07

    申请号:US15092483

    申请日:2016-04-06

    Abstract: Techniques are described for executing an analytical query with a top-N clause. In an embodiment, a stream of tuples are received by each of the processing units from a data source identified in the query. The processing unit uses a portion of a received tuple to identify the partition that the tuple is assigned to. For each partition, the processing unit maintains a top-N data store that stores an N number of received tuples that match the criteria of top N tuples according to the query. The received tuple is compared to the N number of tuples to determine whether to store the received tuple and discard an already stored tuple, or to discard the received tuple. After all the tuples have been similarly processed by the processing units, all the top-N data stores for each partition are merged, yielding the top N number of tuples for each partition to return as a result of the query.

    Method for failure-resilient data placement in a distributed query processing system

    公开(公告)号:US09842148B2

    公开(公告)日:2017-12-12

    申请号:US14704825

    申请日:2015-05-05

    CPC classification number: G06F17/30545

    Abstract: Herein is described a data placement scheme for a distributed query processing systems that achieves load balance amongst the nodes of the system. To identify a node on which to place particular data, a supervisor node performs a placement algorithm over the particular data's identifier, where the placement algorithm utilizes two or more hash functions. The supervisor node runs the placement algorithm until a destination node is identified that is available to store the data, or the supervisor node has run the placement algorithm an established number of times. If no available node is identified using the placement algorithm, then an available destination node is identified for the particular data and information identifying the data and the selected destination node is included in an exception map. Most data may be located by any node in the system based on the node performing the placement algorithm for the required data.

    Tail-based top-N query evaluation

    公开(公告)号:US10394811B2

    公开(公告)日:2019-08-27

    申请号:US15608830

    申请日:2017-05-30

    Abstract: Techniques are described for executing a query with a top-N clause to select a first N-number of rows in a data source arranged at least according to a first key and a second key of the data source using a first sort order respectively specified for the first key and a second sort order respectively specified for the second key by the query. The data source may include one or more tiles that include at least a portion of the first key and the second key. To execute the query, in an embodiment, a DBMS determines, in a first vector of first key values that are in a first tile, row identifiers identifying entries of the first vector that contain values equal to a tail value that follows a particular top number of the first key values. The DBMS may select, from a second vector of values of the second key in the first tile, second key values identified based on the determined row identifiers of the first vector. In an embodiment, the DBMS generates a result set of the query that includes at least a value from the second key values selected from the second vector based on the determined first row identifiers.

    Tail-based top-N query evaluation

    公开(公告)号:US11194801B2

    公开(公告)日:2021-12-07

    申请号:US16446636

    申请日:2019-06-20

    Abstract: Techniques are described for executing a query with a top-N clause to select a first N-number of rows in a data source arranged at least according to a first key and a second key of the data source using a first sort order respectively specified for the first key and a second sort order respectively specified for the second key by the query. The data source may include one or more tiles that include at least a portion of the first key and the second key. To execute the query, in an embodiment, a DBMS determines, in a first vector of first key values that are in a first tile, row identifiers identifying entries of the first vector that contain values equal to a tail value that follows a particular top number of the first key values. The DBMS may select, from a second vector of values of the second key in the first tile, second key values identified based on the determined row identifiers of the first vector. In an embodiment, the DBMS generates a result set of the query that includes at least a value from the second key values selected from the second vector based on the determined first row identifiers.

    Dynamic operation scheduling for distributed data processing

    公开(公告)号:US10956417B2

    公开(公告)日:2021-03-23

    申请号:US15581984

    申请日:2017-04-28

    Abstract: Techniques are provided for scheduling data operations for a given query based upon a query-cost model that analyzes the cost of scheduling data operations based upon their operation cost and the type of resources needed for the operation. In an embodiment, a database server receives a set of operations for a query. The database server determines a set of leaf operation nodes from the set of data operations, where the set of leaf operation nodes includes operation nodes that do not depend on the execution of other nodes within the set of data operations. The database server compares operation costs between the leaf operation nodes to determine which leaf operation node to insert into a scheduled order set. The database server inserts the leaf operation node into the scheduled order set. Then the database server iteratively determines new leaf operation nodes and performs cost analysis on remaining leaf operation nodes to generate a set of scheduled data operations.

    DYNAMIC OPERATION SCHEDULING FOR DISTRIBUTED DATA PROCESSING

    公开(公告)号:US20180314733A1

    公开(公告)日:2018-11-01

    申请号:US15581984

    申请日:2017-04-28

    Abstract: Techniques are provided for scheduling data operations for a given query based upon a query-cost model that analyzes the cost of scheduling data operations based upon their operation cost and the type of resources needed for the operation. In an embodiment, a database server receives a set of operations for a query. The database server determines a set of leaf operation nodes from the set of data operations, where the set of leaf operation nodes includes operation nodes that do not depend on the execution of other nodes within the set of data operations. The database server compares operation costs between the leaf operation nodes to determine which leaf operation node to insert into a scheduled order set. The database server inserts the leaf operation node into the scheduled order set. Then the database server iteratively determines new leaf operation nodes and performs cost analysis on remaining leaf operation nodes to generate a set of scheduled data operations.

    METHOD FOR FAILURE-RESILIENT DATA PLACEMENT IN A DISTRIBUTED QUERY PROCESSING SYSTEM
    7.
    发明申请
    METHOD FOR FAILURE-RESILIENT DATA PLACEMENT IN A DISTRIBUTED QUERY PROCESSING SYSTEM 有权
    分布式查询处理系统中的故障数据放置方法

    公开(公告)号:US20160328456A1

    公开(公告)日:2016-11-10

    申请号:US14704825

    申请日:2015-05-05

    CPC classification number: G06F17/30545

    Abstract: Herein is described a data placement scheme for a distributed query processing systems that achieves load balance amongst the nodes of the system. To identify a node on which to place particular data, a supervisor node performs a placement algorithm over the particular data's identifier, where the placement algorithm utilizes two or more hash functions. The supervisor node runs the placement algorithm until a destination node is identified that is available to store the data, or the supervisor node has run the placement algorithm an established number of times. If no available node is identified using the placement algorithm, then an available destination node is identified for the particular data and information identifying the data and the selected destination node is included in an exception map. Most data may be located by any node in the system based on the node performing the placement algorithm for the required data.

    Abstract translation: 这里描述了在系统的节点之间实现负载平衡的分布式查询处理系统的数据放置方案。 为了识别放置特定数据的节点,管理员节点对特定数据的标识符执行放置算法,其中放置算法利用两个或更多个散列函数。 管理员节点运行放置算法,直到识别到可用于存储数据的目标节点,或者管理员节点已经建立了放置算法已建立的次数。 如果使用放置算法没有识别可用节点,则识别特定数据的可用目的地节点,并且识别数据的信息和所选择的目的地节点被包括在异常映射中。 大多数数据可以由系统中的任何节点基于执行所需数据的放置算法的节点来定位。

Patent Agency Ranking