Compiler-level general matrix multiplication configuration optimization

    公开(公告)号:US11842178B2

    公开(公告)日:2023-12-12

    申请号:US17182753

    申请日:2021-02-23

    CPC classification number: G06F8/443 G06F7/16 G06F8/447

    Abstract: A system and method is provided for optimizing general matrix multiplication (GEMM) on target hardware by splitting matrices to be multiplied into tiles and formulating a tiling configuration search problem for matrices to be multiplied that explores a configuration search space to identify an optimal tiling configuration that minimizes running time on the target hardware for multiplication of matrices A (m×k) and B (k×n) on the target hardware for respective configuration states as a function of matrix parameters m, k, and n, and numbers of respective nested loops for each dimension m, k, and n, respectively. The optimal tiling configuration for the target hardware is obtained by implementing a Greedy Best-First-Search (GBFS) algorithm or a Neighborhood Actor Advantage Critic (N-A2C) algorithm that optimizes the running time for multiplication of the matrices on the target hardware, and the target hardware is configured and computations are run accordingly.

    Accelerator controller for inserting template microcode instructions into a microcode buffer to accelerate matrix operations

    公开(公告)号:US11836488B2

    公开(公告)日:2023-12-05

    申请号:US17758129

    申请日:2020-01-13

    CPC classification number: G06F9/267 G06F7/16 G06F9/268

    Abstract: A method for a controller to execute a program comprising a sequence of functions on an accelerator with a pipelined architecture comprising a microcode buffer. The method comprises executing a function of the program as a sequence of operations, wherein the sequence of operations is represented by a sequence of templates, determining whether the template is non-colliding with previously inserted templates in the microcode buffer, determining whether data in local memory will be referenced before all previously inserted templates have taken effect, determining whether registers will be referenced before all previously inserted templates in the microcode buffer have taken effect, when it is determined that the template fits, that resources are available, that local data memory accesses will not collide, and that register accesses will not collide: creating a sequence of microcode instructions in the template, and inserting the template into the microcode buffer.

    ADAPTIVE SORT ACCELERATOR SHARING FIRST LEVEL PROCESSOR CACHE

    公开(公告)号:US20200073634A1

    公开(公告)日:2020-03-05

    申请号:US16118592

    申请日:2018-08-31

    Abstract: A computer processor includes a processor cache that obtains tree data from the memory unit indicative of key values that are pre-sorted in a memory unit. A hardware adaptive merge sort accelerator generates a tournament tree based on the key values, and performs a partial tournament sort that compares a selected key value to a plurality of participating key values to define a sorting path. The hardware adaptive merge sort accelerator also determines an overall winning key value of the partial tournament and a runner-up key value located on the sorting path that is a next lowest key value among the participating key values. The remaining key values are compared to the runner-up key value to sort at least one of the remaining key values in sequential order with respect to the overall winning key value and the runner-up key value.

    System and method for resource reconciliation in an enterprise management system

    公开(公告)号:US10534577B2

    公开(公告)日:2020-01-14

    申请号:US14851899

    申请日:2015-09-11

    Abstract: A method to reconcile multiple instances of a single computer resource identified by resource discovery operations includes: (1) accessing information describing one or more resources; (2) identifying, via the accessed information, at least one resource that has been detected or discovered by at least two of the discovery operations; and (3) merging attributes associated with the identified resource from each of the at least two discovery operations into a single, reconciled resource object. Illustrative “resources” include, but are not limited to, computer systems, components of computer systems, data storage systems, switches, routers, memory, software applications (e.g., accounting and database applications), operating systems and business services (e.g., order entry or change management and tracking services).

    Nondecreasing sequence determining device, method and program

    公开(公告)号:US10333697B2

    公开(公告)日:2019-06-25

    申请号:US15516175

    申请日:2015-10-05

    Abstract: Determination as to whether a nondecreasing sequence exists or not is efficiently made. A sorting part sorts elements of a set Pi in ascending order to generate vectors ti,i+1 and bi,i+1. A merging part generates vectors t0,m and b0,m by repeating the process of merging vectors (ti,j, bi,j) and (tj,k, bj,k) to generate (ti,k, bi,k). A stable-sorting part generates a vector e by coupling and stably sorting vectors bi,j and tj,k. A searching part searches for sets of (λ, x, y) in which e[λ] is bi,j[x] and e[λ+1] is tj,k[y] and generates a set X including all x and a set Y including all y. An extracting part sorts ti,j[x] (x∈X) in ascending order to generate a vector ti,k and sorts bj,k[y] (y∈Y) in ascending order to generate a vector bi,k. If the length of a vector t0,m is 0, a determining part outputs a result of determination that indicates the absence of a nondecreasing sequence.

Patent Agency Ranking