-
公开(公告)号:US11842178B2
公开(公告)日:2023-12-12
申请号:US17182753
申请日:2021-02-23
Applicant: Huawei Technologies Co., Ltd.
Inventor: Hui Zang , Huaqing Zhang , Xiaolin Cheng
Abstract: A system and method is provided for optimizing general matrix multiplication (GEMM) on target hardware by splitting matrices to be multiplied into tiles and formulating a tiling configuration search problem for matrices to be multiplied that explores a configuration search space to identify an optimal tiling configuration that minimizes running time on the target hardware for multiplication of matrices A (m×k) and B (k×n) on the target hardware for respective configuration states as a function of matrix parameters m, k, and n, and numbers of respective nested loops for each dimension m, k, and n, respectively. The optimal tiling configuration for the target hardware is obtained by implementing a Greedy Best-First-Search (GBFS) algorithm or a Neighborhood Actor Advantage Critic (N-A2C) algorithm that optimizes the running time for multiplication of the matrices on the target hardware, and the target hardware is configured and computations are run accordingly.
-
公开(公告)号:US11836488B2
公开(公告)日:2023-12-05
申请号:US17758129
申请日:2020-01-13
Applicant: Telefonaktiebolaget LM Ericsson (publ)
Inventor: Anders Wesslén , Michael Breschel
Abstract: A method for a controller to execute a program comprising a sequence of functions on an accelerator with a pipelined architecture comprising a microcode buffer. The method comprises executing a function of the program as a sequence of operations, wherein the sequence of operations is represented by a sequence of templates, determining whether the template is non-colliding with previously inserted templates in the microcode buffer, determining whether data in local memory will be referenced before all previously inserted templates have taken effect, determining whether registers will be referenced before all previously inserted templates in the microcode buffer have taken effect, when it is determined that the template fits, that resources are available, that local data memory accesses will not collide, and that register accesses will not collide: creating a sequence of microcode instructions in the template, and inserting the template into the microcode buffer.
-
公开(公告)号:US11238360B2
公开(公告)日:2022-02-01
申请号:US15894358
申请日:2018-02-12
Applicant: International Business Machines Corporation
Inventor: Lev Samuel Bishop , Jay M. Gambetta
Abstract: The technology is generally directed towards a pulse generation component that outputs a control pulse with a timing delay. A qubit state decision component uses an analog kernel to perform a linear filtering operation on (e.g., multiplies and integrates) a qubit signal to obtain a result corresponding to a qubit state, and compares the result to a threshold value to determine a measurement outcome result corresponding to the qubit state. A conditional gate component conditionally gates the control pulse based on the measurement outcome result.
-
公开(公告)号:US11042715B2
公开(公告)日:2021-06-22
申请号:US16381613
申请日:2019-04-11
Applicant: International Business Machines Corporation
Abstract: A system can include a memristive crossbar array, which can include row lines and column lines intersecting the row lines. Resistive memory elements can be coupled between the row lines and the column lines at the junctions formed by the row and column lines. The resistive memory elements represent the values of the matrix. The system can further include an analogue circuit. The system can be configured to perform an exponentiation of the values of the vector in accordance with a first exponent. The crossbar array can be configured to apply the resulting values of the vector to the resistive elements thereby generating currents. The analogue circuit can be configured to perform an exponentiation of the generated currents in accordance with a second exponent.
-
公开(公告)号:US11023473B2
公开(公告)日:2021-06-01
申请号:US16017817
申请日:2018-06-25
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ying Shan , Jian Jiao , Jie Zhu , Jianchang Mao
IPC: G06F16/2457 , G06F7/16 , G06N3/04 , G06N3/063 , G06N3/08 , G06F16/2455
Abstract: A computational search method for retrieving computer information related to a query includes transforming a plurality of candidate answers to candidate answer recurrent binary embedding (RBE) embeddings using a trained RBE model. A query is transformed to a query RBE embedding using the trained RBE model. The query RBE embedding is compared to each candidate answer RBE embedding of a plurality of candidate answer RBE embeddings using a similarity function. The candidate answers are sorted based on the comparisons made using the similarity function, and returning a plurality of the top candidate answers.
-
公开(公告)号:US10691412B2
公开(公告)日:2020-06-23
申请号:US16118582
申请日:2018-08-31
Applicant: International Business Machines Corporation
Inventor: Christian Jacobi , Aditya Puranik , Martin Recktenwald , Christian Zoellin
Abstract: A computer processor includes a memory unit, a processor cache and a hardware merge sort accelerator. The memory unit stores key values to be sequentially sorted. The processor cache obtains tree data from the memory unit indicating the key values. The hardware merge sort accelerator is configured to generate a master tournament tree based on the key values and perform a tournament sort that determines a first winning key value based on the master tournament tree. The hardware merge sort accelerator further speculates a second winning key value based on the master tournament tree. The speculated second winning key value is a next sequential winning key value of the tournament sort.
-
公开(公告)号:US20200073634A1
公开(公告)日:2020-03-05
申请号:US16118592
申请日:2018-08-31
Applicant: International Business Machines Corporation
Inventor: Christian Jacobi , Aditya Puranik , Martin Recktenwald , Christian Zoellin
Abstract: A computer processor includes a processor cache that obtains tree data from the memory unit indicative of key values that are pre-sorted in a memory unit. A hardware adaptive merge sort accelerator generates a tournament tree based on the key values, and performs a partial tournament sort that compares a selected key value to a plurality of participating key values to define a sorting path. The hardware adaptive merge sort accelerator also determines an overall winning key value of the partial tournament and a runner-up key value located on the sorting path that is a next lowest key value among the participating key values. The remaining key values are compared to the runner-up key value to sort at least one of the remaining key values in sequential order with respect to the overall winning key value and the runner-up key value.
-
公开(公告)号:US10534577B2
公开(公告)日:2020-01-14
申请号:US14851899
申请日:2015-09-11
Applicant: BMC SOFTWARE, INC.
Inventor: Narayan Kumar , Douglas Mueller , Richard Mayfield
IPC: G06F7/32 , G06F7/20 , G06F7/14 , G06F7/36 , G06F16/22 , G06F16/2457 , H04L12/24 , G06F7/16 , H04L29/12
Abstract: A method to reconcile multiple instances of a single computer resource identified by resource discovery operations includes: (1) accessing information describing one or more resources; (2) identifying, via the accessed information, at least one resource that has been detected or discovered by at least two of the discovery operations; and (3) merging attributes associated with the identified resource from each of the at least two discovery operations into a single, reconciled resource object. Illustrative “resources” include, but are not limited to, computer systems, components of computer systems, data storage systems, switches, routers, memory, software applications (e.g., accounting and database applications), operating systems and business services (e.g., order entry or change management and tracking services).
-
公开(公告)号:US10339201B1
公开(公告)日:2019-07-02
申请号:US16102431
申请日:2018-08-13
Applicant: Altera Corporation
Inventor: Andrew Chaang Ling , Davor Capalija , Tomasz Sebastian Czajkowski , Andrei Mihai Hagiescu Miriste
Abstract: Systems and methods for calculating a dot product using digital signal processing units that are organized into a dot product processing unit for dot product processing using multipliers and adders of the digital signal processing units.
-
公开(公告)号:US10333697B2
公开(公告)日:2019-06-25
申请号:US15516175
申请日:2015-10-05
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Koki Hamada , Dai Ikarashi , Naoto Kiribuchi
Abstract: Determination as to whether a nondecreasing sequence exists or not is efficiently made. A sorting part sorts elements of a set Pi in ascending order to generate vectors ti,i+1 and bi,i+1. A merging part generates vectors t0,m and b0,m by repeating the process of merging vectors (ti,j, bi,j) and (tj,k, bj,k) to generate (ti,k, bi,k). A stable-sorting part generates a vector e by coupling and stably sorting vectors bi,j and tj,k. A searching part searches for sets of (λ, x, y) in which e[λ] is bi,j[x] and e[λ+1] is tj,k[y] and generates a set X including all x and a set Y including all y. An extracting part sorts ti,j[x] (x∈X) in ascending order to generate a vector ti,k and sorts bj,k[y] (y∈Y) in ascending order to generate a vector bi,k. If the length of a vector t0,m is 0, a determining part outputs a result of determination that indicates the absence of a nondecreasing sequence.
-
-
-
-
-
-
-
-
-