Hardware accelerator for IM2COL operation

    公开(公告)号:US11783163B2

    公开(公告)日:2023-10-10

    申请号:US16901542

    申请日:2020-06-15

    Applicant: Arm Limited

    CPC classification number: G06N3/04 G06F9/30105 G06F17/16 G06N3/08

    Abstract: The present disclosure advantageously provides a matrix expansion unit that includes an input data selector, a first register set, a second register set, and an output data selector. The input data selector is configured to receive first matrix data in a columnwise format. The first register set is coupled to the input data selector, and includes a plurality of data selectors and a plurality of registers arranged in a first shift loop. The second register set is coupled to the data selector, and includes a plurality of data selectors and a plurality of registers arranged in a second shift loop. The output data selector is coupled to the first register set and the second register set, and is configured to output second matrix data in a rowwise format.

    Time Domain Unrolling Sparse Matrix Multiplication System and Method

    公开(公告)号:US20220035890A1

    公开(公告)日:2022-02-03

    申请号:US17103676

    申请日:2020-11-24

    Applicant: Arm Limited

    Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.

    MIXED-PRECISION COMPUTATION UNIT
    15.
    发明申请

    公开(公告)号:US20210089889A1

    公开(公告)日:2021-03-25

    申请号:US16836117

    申请日:2020-03-31

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

    SYSTOLIC CONVOLUTIONAL NEURAL NETWORK
    18.
    发明申请

    公开(公告)号:US20190311243A1

    公开(公告)日:2019-10-10

    申请号:US15945952

    申请日:2018-04-05

    Applicant: Arm Limited

    Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

    Time domain unrolling sparse matrix multiplication system and method

    公开(公告)号:US11928176B2

    公开(公告)日:2024-03-12

    申请号:US17103676

    申请日:2020-11-24

    Applicant: Arm Limited

    CPC classification number: G06F17/16 G06F7/5443 G06F15/80 G06F9/3893

    Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.

    Refactoring mac operations
    20.
    发明授权

    公开(公告)号:US11922169B2

    公开(公告)日:2024-03-05

    申请号:US17674503

    申请日:2022-02-17

    Applicant: Arm Limited

    Abstract: A method and apparatus for performing refactored multiply-and-accumulate operations is provided. A summing array includes a plurality of non-volatile memory elements arranged in columns. Each non-volatile memory element in the summing array is programmed to a high resistance state or a low resistance state based on weights of a neural network. The summing array is configured to generate a summed signal for each column based, at least in part, on a plurality of input signals. A multiplying array is coupled to the summing array, and includes a plurality of non-volatile memory elements. Each non-volatile memory element in the multiplying array is programmed to a different conductance level based on the weights of the neural network. The multiplying array is configured to generate an output signal based, at least in part, on the summed signals from the summing array.

Patent Agency Ranking