Concurrent compute and ECC for in-memory matrix vector operations

    公开(公告)号:US11513893B2

    公开(公告)日:2022-11-29

    申请号:US17128414

    申请日:2020-12-21

    Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

    CONCURRENT COMPUTE AND ECC FOR IN-MEMORY MATRIX VECTOR OPERATIONS

    公开(公告)号:US20210109809A1

    公开(公告)日:2021-04-15

    申请号:US17128414

    申请日:2020-12-21

    Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

    Apparatus and method for a masked multiply instruction to support neural network pruning operations

    公开(公告)号:US10929503B2

    公开(公告)日:2021-02-23

    申请号:US16230814

    申请日:2018-12-21

    Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

    Accelerator for processing data
    6.
    发明授权

    公开(公告)号:US10509846B2

    公开(公告)日:2019-12-17

    申请号:US15840552

    申请日:2017-12-13

    Inventor: Chen Koren Dan Baum

    Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

    ACCELERATOR FOR PROCESSING DATA
    7.
    发明申请

    公开(公告)号:US20190042538A1

    公开(公告)日:2019-02-07

    申请号:US15840552

    申请日:2017-12-13

    Inventor: Chen Koren Dan Baum

    Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

Patent Agency Ranking