Dynamic grouping of in-memory data processing operations

    公开(公告)号:US10366124B2

    公开(公告)日:2019-07-30

    申请号:US15616777

    申请日:2017-06-07

    Abstract: Techniques are described herein for grouping of operations in local memory of a processing unit. The techniques involve adding a first operation for a first leaf operator of a query execution plan to a first pipelined group. The query execution plan includes a set of leaf operators and a set of non-leaf operators. Each leaf operator of the set of one or more leaf operators has a respective parent non-leaf operator and each non-leaf operator has one or more child operators from among the set of leaf operators or others of the set of non-leaf operators. The techniques further involve determining a memory requirement of executing the first operation for the first leaf operator and executing a second operation for the respective parent non-leaf operator of the first leaf operator. The output of the first operation is input to the second operation. The techniques further involve determining whether the memory requirement is satisfied by an amount of local memory. If it is determined that the memory requirement is satisfied by the amount of local memory the second operation for the respective parent non-leaf operator is added to the first pipelined group. The techniques further involve assigning the first pipelined group to a first thread and the first thread executing the first pipelined group. Executing the first pipelined group involves: storing first output of the first operation in the local memory of the first thread; using the first output as input for the second operation; storing second output of the second operation in the local memory; and moving second output from the local memory to a tier of memory different than the local memory relative to the first thread.

    DISTRIBUTED RELATIONAL DICTIONARIES
    2.
    发明申请

    公开(公告)号:US20190205446A1

    公开(公告)日:2019-07-04

    申请号:US15861212

    申请日:2018-01-03

    Abstract: Techniques related to distributed relational dictionaries are disclosed. In some embodiments, one or more non-transitory storage media store a sequence of instructions which, when executed by one or more computing devices, cause performance of a method. The method involves generating, by a query optimizer at a distributed database system (DDS), a query execution plan (QEP) for generating a code dictionary and a column of encoded database data. The QEP specifies a sequence of operations for generating the code dictionary. The code dictionary is a database table. The method further involves receiving, at the DDS, a column of unencoded database data from a data source that is external to the DDS. The DDS generates the code dictionary according to the QEP. Furthermore, based on joining the column of unencoded database data with the code dictionary, the DDS generates the column of encoded database data according to the QEP.

    Partition aware evaluation of top-N queries

    公开(公告)号:US10706055B2

    公开(公告)日:2020-07-07

    申请号:US15092483

    申请日:2016-04-06

    Abstract: Techniques are described for executing an analytical query with a top-N clause. In an embodiment, a stream of tuples are received by each of the processing units from a data source identified in the query. The processing unit uses a portion of a received tuple to identify the partition that the tuple is assigned to. For each partition, the processing unit maintains a top-N data store that stores an N number of received tuples that match the criteria of top N tuples according to the query. The received tuple is compared to the N number of tuples to determine whether to store the received tuple and discard an already stored tuple, or to discard the received tuple. After all the tuples have been similarly processed by the processing units, all the top-N data stores for each partition are merged, yielding the top N number of tuples for each partition to return as a result of the query.

    LIMITED MEMORY AND STATISTICS RESILIENT HASH JOIN EXECUTION

    公开(公告)号:US20190303482A1

    公开(公告)日:2019-10-03

    申请号:US15944473

    申请日:2018-04-03

    Abstract: Techniques are described for building and probing a hash table where the size of an input partition is larger than the cache size of a receiving processor. A processor receives a payload array and generates a hash table in cache that includes a hash bucket array. Each hash bucket element contains an identifier that defines a location of a build key array element in the payload array. For a particular build key array element, the processor determines a hash bucket element that corresponds to the payload array. The processor copies the identifier for particular build key array element into the hash bucket element. If the cache is unable to insert additional build key array elements into the hash table in the cache, then the processor generates a second hash table for the remaining build key array elements in local volatile memory. When probing, the processor probes both hash tables in the cache and local volatile memory for identifiers in hash bucket elements that are used to locate matching build key array elements.

    Limited memory and statistics resilient hash join execution

    公开(公告)号:US10810207B2

    公开(公告)日:2020-10-20

    申请号:US15944473

    申请日:2018-04-03

    Abstract: A processor receives a payload array and generates a hash table in a cache that includes a hash bucket array. Each hash bucket element contains an identifier that defines a location of a build key array element in the payload array. For a particular build key array element, the processor determines a hash bucket element that corresponds to the payload array. The processor copies the identifier for particular build key array element into the hash bucket element. If the cache is unable to insert additional build key array elements into the hash table in the cache, then the processor generates a second hash table for the remaining build key array elements in local volatile memory. When probing, the processor probes both hash tables in the cache and local volatile memory for identifiers in hash bucket elements that are used to locate matching build key array elements.

    DYNAMIC GROUPING OF IN-MEMORY DATA PROCESSING OPERATIONS

    公开(公告)号:US20180357331A1

    公开(公告)日:2018-12-13

    申请号:US15616777

    申请日:2017-06-07

    CPC classification number: G06F16/90335 G06F9/48

    Abstract: Techniques are described herein for grouping of operations in local memory of a processing unit. The techniques involve adding a first operation for a first leaf operator of a query execution plan to a first pipelined group. The query execution plan includes a set of leaf operators and a set of non-leaf operators. Each leaf operator of the set of one or more leaf operators has a respective parent non-leaf operator and each non-leaf operator has one or more child operators from among the set of leaf operators or others of the set of non-leaf operators. The techniques further involve determining a memory requirement of executing the first operation for the first leaf operator and executing a second operation for the respective parent non-leaf operator of the first leaf operator. The output of the first operation is input to the second operation. The techniques further involve determining whether the memory requirement is satisfied by an amount of local memory. If it is determined that the memory requirement is satisfied by the amount of local memory the second operation for the respective parent non-leaf operator is added to the first pipelined group. The techniques further involve assigning the first pipelined group to a first thread and the first thread executing the first pipelined group. Executing the first pipelined group involves: storing first output of the first operation in the local memory of the first thread; using the first output as input for the second operation; storing second output of the second operation in the local memory; and moving second output from the local memory to a tier of memory different than the local memory relative to the first thread.

    Disk drive failure prediction with neural networks

    公开(公告)号:US11579951B2

    公开(公告)日:2023-02-14

    申请号:US16144912

    申请日:2018-09-27

    Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.

    Distributed relational dictionaries

    公开(公告)号:US10810195B2

    公开(公告)日:2020-10-20

    申请号:US15861212

    申请日:2018-01-03

    Abstract: Techniques related to distributed relational dictionaries are disclosed. In some embodiments, one or more non-transitory storage media store a sequence of instructions which, when executed by one or more computing devices, cause performance of a method. The method involves generating, by a query optimizer at a distributed database system (DDS), a query execution plan (QEP) for generating a code dictionary and a column of encoded database data. The QEP specifies a sequence of operations for generating the code dictionary. The code dictionary is a database table. The method further involves receiving, at the DDS, a column of unencoded database data from a data source that is external to the DDS. The DDS generates the code dictionary according to the QEP. Furthermore, based on joining the column of unencoded database data with the code dictionary, the DDS generates the column of encoded database data according to the QEP.

    Efficient partitioning of relational data

    公开(公告)号:US10592531B2

    公开(公告)日:2020-03-17

    申请号:US15438521

    申请日:2017-02-21

    Abstract: Techniques for non-power-of-two partitioning of a data set as well as generation and selection of partition schemes for the data set. In an embodiment, one or more iterations of a partition scheme is for a non-power-of-two number of partitions. Extended hash partitioning may be used to partition a data set into a non-power-of-two number of partitions by determining the partition identifier of each tuple of the data set using the extended hash partitioning algorithm. In an embodiment, multiple partition schemes are generated for multiple data sets, based on properties of the data sets and/or availability of computing resources for the partition operation or the subsequent operation to the partition operation. The generated partition schemes may use non-power-of-two partitioning for one or more iterations of a generated partition scheme. The most optimal partition scheme may be selected from the generated partition schemes based on optimization policies.

Patent Agency Ranking