Patent search ap:("Intel Corporation") AND inv:"Chen Koren" Page 1

1.

发明授权
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements 有权

公开(公告)号：US11847185B2

公开(公告)日：2023-12-19

申请号：US17485055

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F17/16 , G06F9/38 , G06F9/30

CPC classification number: G06F17/16 , G06F9/3001 , G06F9/3016 , G06F9/30101 , G06F9/3802

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

2.

发明授权
Concurrent compute and ECC for in-memory matrix vector operations 有权

公开(公告)号：US11513893B2

公开(公告)日：2022-11-29

申请号：US17128414

申请日：2020-12-21

Applicant: Intel Corporation

Inventor： Somnath Paul , Charles Augustine , Chen Koren , George Shchupak , Muhammad M. Khellah

IPC: G06F11/00 , G06F11/10 , G06N3/08

Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

3.

发明申请
CONCURRENT COMPUTE AND ECC FOR IN-MEMORY MATRIX VECTOR OPERATIONS 有权

公开(公告)号：US20210109809A1

公开(公告)日：2021-04-15

申请号：US17128414

申请日：2020-12-21

Applicant: Intel Corporation

Inventor： Somnath Paul , Charles Augustine , Chen Koren , George Shchupak , Muhammad M. Khellah

IPC: G06F11/10 , G06N3/08

Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

4.

发明授权
Ultra-deep compute static random access memory with high compute throughput and multi-directional data propagation 有权

公开(公告)号：US11450672B2

公开(公告)日：2022-09-20

申请号：US16859600

申请日：2020-04-27

Applicant: Intel Corporation

Inventor： Charles Augustine , Somnath Paul , Muhammad M. Khellah , Chen Koren

IPC: G11C17/16 , H01L27/11 , G11C11/418 , G11C11/419 , G11C11/412

Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.

5.

发明授权
Apparatus and method for a masked multiply instruction to support neural network pruning operations 有权

公开(公告)号：US10929503B2

公开(公告)日：2021-02-23

申请号：US16230814

申请日：2018-12-21

Applicant: Intel Corporation

Inventor： Omid Azizi , Chen Koren , Nitin Garegrat

IPC: G06F17/16 , G06N3/02 , G06F9/30

Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

6.

发明授权
Accelerator for processing data 有权

公开(公告)号：US10509846B2

公开(公告)日：2019-12-17

申请号：US15840552

申请日：2017-12-13

Applicant: Intel Corporation

Inventor： Chen Koren , Dan Baum

IPC: G06F17/16 , G06F7/544 , G06F7/523 , G06N3/04 , G06N3/063 , G06N3/08 , G06F9/30 , G06N3/02

Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

7.

发明申请
ACCELERATOR FOR PROCESSING DATA 审中-公开

公开(公告)号：US20190042538A1

公开(公告)日：2019-02-07

申请号：US15840552

申请日：2017-12-13

Applicant: Intel Corporation

Inventor： Chen Koren , Dan Baum

IPC: G06F17/16 , G06F7/523 , G06N3/02 , G06F9/30

Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

8.

发明授权
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements 有权

公开(公告)号：US12287843B2

公开(公告)日：2025-04-29

申请号：US18502291

申请日：2023-11-06

Applicant: Intel Corporation

Inventor： Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification