Multi-Dimensional Data Path Architecture

    公开(公告)号:US20220382690A1

    公开(公告)日:2022-12-01

    申请号:US17334960

    申请日:2021-05-31

    Applicant: Arm Limited

    Abstract: Various implementations described herein are directed to a device having a multi-layered logic structure with a first logic layer and a second logic layer arranged vertically in a stacked configuration. The device may have a memory array that provides data, and also, the device may have an inter-layer data bus that vertically couples the memory array to the multi-layered logic structure. The inter-layer data bus may provide multiple data paths to the first logic layer and the second logic layer for reuse of the data provided by the memory array.

    Apparatus and method for matrix operations

    公开(公告)号:US11379556B2

    公开(公告)日:2022-07-05

    申请号:US16417937

    申请日:2019-05-21

    Applicant: Arm Limited

    Abstract: There is provided a data processing apparatus to perform an operation on a first matrix and a second matrix. The data processing apparatus includes receiver circuitry to receive elements of the first matrix, elements of the second matrix, and correspondence data to indicate where the elements of the first matrix are located in the first matrix. Determination circuitry performs, using the correspondence data, a determination of whether, for a given element of the first matrix in column i of the first matrix, a given element of the second matrix occurs in row i of the second matrix. Aggregation circuitry calculates an aggregation between a given row in the first matrix and a given column in the second matrix and includes: functional circuitry to perform, in dependence on the determination, a function on the given element of the first matrix and the given element of the second matrix to produce a partial result.

    Activation Compression Method for Deep Learning Acceleration

    公开(公告)号:US20220164663A1

    公开(公告)日:2022-05-26

    申请号:US17157319

    申请日:2021-01-25

    Applicant: Arm Limited

    Abstract: A system and method for multiplying matrices, and method for training a convolutional neural network (CNN), are provided. The system includes a processor and a matrix multiply accelerator (MMA). The processor is configured to generate, based on an input tensor, a number of basic block matrices, each basic block matrix including a number of elements; for each basic block matrix: prune, based on a sparsity value, the elements of the basic block matrix, generate a mask for the basic block matrix, each mask including a number of bits, each bit corresponding to a different element of the basic block matrix, and compress the basic block matrix to generate a compressed basic block matrix having fewer elements than the basic block matrix. The MMA is configured to multiply, based on the masks, the compressed basic block matrices and a weight matrix to generate an output matrix.

    Hardware Accelerator For IM2COL Operation

    公开(公告)号:US20210390367A1

    公开(公告)日:2021-12-16

    申请号:US16901542

    申请日:2020-06-15

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a matrix expansion unit that includes an input data selector, a first register set, a second register set, and an output data selector. The input data selector is configured to receive first matrix data in a columnwise format. The first register set is coupled to the input data selector, and includes a plurality of data selectors and a plurality of registers arranged in a first shift loop. The second register set is coupled to the data selector, and includes a plurality of data selectors and a plurality of registers arranged in a second shift loop. The output data selector is coupled to the first register set and the second register set, and is configured to output second matrix data in a rowwise format.

    Pipelined Accumulator
    26.
    发明申请

    公开(公告)号:US20210374508A1

    公开(公告)日:2021-12-02

    申请号:US16885704

    申请日:2020-05-28

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multi-stage add module coupled to the input register and the output register. The multi-stage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform back-to-back accumulation into the output register.

    Matrix multiplication system and method

    公开(公告)号:US11120101B2

    公开(公告)日:2021-09-14

    申请号:US16585265

    申请日:2019-09-27

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.

    Hybrid filter banks for artificial neural networks

    公开(公告)号:US12067373B2

    公开(公告)日:2024-08-20

    申请号:US16836110

    申请日:2020-03-31

    Applicant: Arm Limited

    CPC classification number: G06F7/483 G06F7/5443 G06N3/04 G06N3/063 G06N3/08

    Abstract: The present disclosure advantageously provides a system including a memory, a processor, and a circuitry to execute one or more mixed precision layers of an artificial neural network (ANN), each mixed precision layer including high-precision weight filters and low precision weight filters. The circuitry is configured to perform one or more calculations on an input feature map having a plurality of input channels (cin) using the high precision weight filters to create a high precision output feature map having a first number of output channels (k), perform one or more calculations on the input feature map using the low precision weight filters to create a low precision output feature map having a second number of output channels (cout−k), and concatenate the high precision output feature map and the low precision output feature map to create a unified output feature map having a plurality of output channels (cout).

Patent Agency Ranking