Matrix multiplication at memory bandwidth

    公开(公告)号:US10521225B2

    公开(公告)日:2019-12-31

    申请号:US15638168

    申请日:2017-06-29

    Abstract: Techniques related to matrix multiplication at memory bandwidth are disclosed. Computing device(s) perform multiplication of a first matrix with a second matrix to generate a third matrix. A first register stores contiguous element values of the first matrix. Furthermore, a second register stores a first set of contiguous element values of the second matrix, and a third register stores a second set of contiguous element values of the second matrix. The first set and the second set correspond to a first row and a second row, respectively, of the second matrix. The first row and the second row are contiguous rows. A single instruction is executed to cause at least a partial computation of contiguous element values of the third matrix. The single instruction causes multiplication of element values stored in the first register with element values stored in the second and third registers and grouped accumulation of the products.

    Memory management for sparse matrix multiplication

    公开(公告)号:US10452744B2

    公开(公告)日:2019-10-22

    申请号:US15470377

    申请日:2017-03-27

    Abstract: Techniques related to memory management for sparse matrix multiplication are disclosed. Computing device(s) may perform a method for multiplying a row of a first sparse matrix with a second sparse matrix to generate a product matrix row. A compressed representation of the second sparse matrix is stored in main memory. The compressed representation comprises a values array that stores non-zero value(s). Tile(s) corresponding to row(s) of second sparse matrix are loaded into scratchpad memory. The tile(s) comprise set(s) of non-zero value(s) of the values array. A particular partition of an uncompressed representation of the product matrix row is generated in the scratchpad memory. The particular partition corresponds to a partition of the second sparse matrix comprising non-zero value(s) included in the tile(s). When a particular tile is determined to comprise non-zero value(s) that are required to generate the particular partition, the particular tile is loaded into the scratchpad memory.

    GRADIENT-BASED AUTO-TUNING FOR MACHINE LEARNING AND DEEP LEARNING MODELS

    公开(公告)号:US20190095818A1

    公开(公告)日:2019-03-28

    申请号:US15885515

    申请日:2018-01-31

    Abstract: Herein, horizontally scalable techniques efficiently configure machine learning algorithms for optimal accuracy and without informed inputs. In an embodiment, for each particular hyperparameter, and for each epoch, a computer processes the particular hyperparameter. An epoch explores one hyperparameter based on hyperparameter tuples. A respective score is calculated from each tuple. The tuple contains a distinct combination of values, each of which is contained in a value range of a distinct hyperparameter. All values of a tuple that belong to the particular hyperparameter are distinct. All values of a tuple that belong to other hyperparameters are held constant. The value range of the particular hyperparameter is narrowed based on an intersection point of a first line based on the scores and a second line based on the scores. A machine learning algorithm is optimally configured from repeatedly narrowed value ranges of hyperparameters. The configured algorithm is invoked to obtain a result.

    ALGORITHM-SPECIFIC NEURAL NETWORK ARCHITECTURES FOR AUTOMATIC MACHINE LEARNING MODEL SELECTION

    公开(公告)号:US20190095756A1

    公开(公告)日:2019-03-28

    申请号:US15884163

    申请日:2018-01-30

    Abstract: Techniques are provided for selection of machine learning algorithms based on performance predictions by trained algorithm-specific regressors. In an embodiment, a computer derives meta-feature values from an inference dataset by, for each meta-feature, deriving a respective meta-feature value from the inference dataset. For each trainable algorithm and each regression meta-model that is respectively associated with the algorithm, a respective score is calculated by invoking the meta-model based on at least one of: a respective subset of meta-feature values, and/or hyperparameter values of a respective subset of hyperparameters of the algorithm. The algorithm(s) are selected based on the respective scores. Based on the inference dataset, the selected algorithm(s) may be invoked to obtain a result. In an embodiment, the trained regressors are distinctly configured artificial neural networks. In an embodiment, the trained regressors are contained within algorithm-specific ensembles. Techniques are also provided for optimal training of regressors and/or ensembles.

    Massively parallel and in-memory execution of grouping and aggregation in a heterogeneous system

    公开(公告)号:US10204140B2

    公开(公告)日:2019-02-12

    申请号:US13831122

    申请日:2013-03-14

    Abstract: A system and method for processing a group and aggregate query on a relation are disclosed. A database system determines whether assistance of a heterogeneous system (HS) of compute nodes is beneficial in performing the query. Assuming that the relation has been partitioned and loaded into the HS, the database system determines, in a compile phase, whether the HS has the functional capabilities to assist, and whether the cost and benefit favor performing the operation with the assistance of the HS. If the cost and benefit favor using the assistance of the HS, then the system enters the execution phase. The database system starts, in the execution phase, an optimal number of parallel processes to produce and consume the results from the compute nodes of the HS. After any needed transaction consistency checks, the results of the query are returned by the database system.

    DYNAMIC GROUPING OF IN-MEMORY DATA PROCESSING OPERATIONS

    公开(公告)号:US20180357331A1

    公开(公告)日:2018-12-13

    申请号:US15616777

    申请日:2017-06-07

    CPC classification number: G06F16/90335 G06F9/48

    Abstract: Techniques are described herein for grouping of operations in local memory of a processing unit. The techniques involve adding a first operation for a first leaf operator of a query execution plan to a first pipelined group. The query execution plan includes a set of leaf operators and a set of non-leaf operators. Each leaf operator of the set of one or more leaf operators has a respective parent non-leaf operator and each non-leaf operator has one or more child operators from among the set of leaf operators or others of the set of non-leaf operators. The techniques further involve determining a memory requirement of executing the first operation for the first leaf operator and executing a second operation for the respective parent non-leaf operator of the first leaf operator. The output of the first operation is input to the second operation. The techniques further involve determining whether the memory requirement is satisfied by an amount of local memory. If it is determined that the memory requirement is satisfied by the amount of local memory the second operation for the respective parent non-leaf operator is added to the first pipelined group. The techniques further involve assigning the first pipelined group to a first thread and the first thread executing the first pipelined group. Executing the first pipelined group involves: storing first output of the first operation in the local memory of the first thread; using the first output as input for the second operation; storing second output of the second operation in the local memory; and moving second output from the local memory to a tier of memory different than the local memory relative to the first thread.

    CONSISTENT QUERY EXECUTION FOR BIG DATA ANALYTICS IN A HYBRID DATABASE

    公开(公告)号:US20180349458A1

    公开(公告)日:2018-12-06

    申请号:US15610171

    申请日:2017-05-31

    CPC classification number: G06F16/273 G06F16/2365 G06F16/2379 G06F16/2455

    Abstract: Techniques are described for efficient query processing and data change propagation to a secondary database system. The secondary database system may execute queries received at a primary database system. Database changes made at the primary system are copied to the secondary system. The primary system receives a query to be executed on either the primary system or the secondary system. The primary system determines whether to send the query to the secondary system based upon whether data objects stored within the secondary system have pending changes that need to be applied to the data objects. The pending changes are stored within in-memory journals within the primary system. The primary system scans for the pending changes to the data objects and sends the pending changes to the secondary system. The secondary system then receives and applies the pending changes to the data objects within the secondary system. Upon applying the pending changes, the secondary system executes the query.

    Run length encoding aware direct memory access filtering engine for scratchpad enabled multicore processors

    公开(公告)号:US10055358B2

    公开(公告)日:2018-08-21

    申请号:US15074248

    申请日:2016-03-18

    Abstract: Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location.

    Version control based on a dual-range validity model

    公开(公告)号:US09811560B2

    公开(公告)日:2017-11-07

    申请号:US14824920

    申请日:2015-08-12

    CPC classification number: G06F17/30448 G06F17/30345 G06F17/30353

    Abstract: Techniques related to version control based on a dual-range validity model are disclosed. In an embodiment, an online analytical processing (OLAP) server stores a plurality of version records describing versions of a data item. A version record may describe any open transactions for a version of the data item. The version record may specify a commit timestamp for the data item at a database and a valid timestamp at least as great as the commit timestamp. The commit timestamp and the valid timestamp may specify a validity range. The version record may also specify an expiration timestamp, which along with the valid timestamp may specify an unresolved range. The OLAP server may also identify a valid version of the data item for a query timestamp that corresponds to a query for particular data in the data item and that falls within either the validity range or the unresolved range.

Patent Agency Ranking