Nibble Block Format
    2.
    发明申请

    公开(公告)号:US20230076138A1

    公开(公告)日:2023-03-09

    申请号:US17470470

    申请日:2021-09-09

    Applicant: Arm Limited

    Abstract: A matrix multiplication system and method are provided. The system includes a memory that stores one or more weight tensors, a processor and a matrix multiply accelerator (MMA). The processor converts each weight tensor into an encoded block set that is stored in the memory. Each encoded block set includes a number of encoded blocks, and each encoded block includes a data field and an index field. The MMA converts each encoded block set into a reconstructed weight tensor, and convolves each reconstructed weight tensor and an input data tensor to generate an output data matrix.

    Artificial Neural Network Optical Hardware Accelerator

    公开(公告)号:US20210287078A1

    公开(公告)日:2021-09-16

    申请号:US16818302

    申请日:2020-03-13

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides an Optical Hardware Accelerator (OHA) for an Artificial Neural Network (ANN) that includes a communication bus interface, a memory, a controller, and an optical computing engine (OCE). The OCE is configured to execute an ANN model with ANN weights. Each ANN weight includes a quantized phase shift value θi and a phase shift value ϕi. The OCE includes a digital-to-optical (D/O) converter configured to generate input optical signals based on the input data, an optical neural network (ONN) configured to generate output optical signals based on the input optical signals, and an optical-to-digital (O/D) converter configured to generate the output data based on the output optical signals. The ONN includes a plurality of optical units (OUs), and each OU includes an optical multiply and accumulate (OMAC) module.

    Bit Sparse Neural Network Optimization
    4.
    发明公开

    公开(公告)号:US20240013052A1

    公开(公告)日:2024-01-11

    申请号:US17861824

    申请日:2022-07-11

    Applicant: Arm Limited

    CPC classification number: G06N3/082

    Abstract: A method, system and apparatus provide bit-sparse neural network optimization. Rather than quantizing and pruning weight and activation elements at the word level, weight and activation elements are pruned at the bit level, which reduces the density of effective “set” bits in weight and activation data, which, advantageously, reduces the power consumption of the neural network inference process by reducing the degree of bit-level switching during inference.

    Artificial neural network optical hardware accelerator

    公开(公告)号:US11526743B2

    公开(公告)日:2022-12-13

    申请号:US16818302

    申请日:2020-03-13

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides an Optical Hardware Accelerator (OHA) for an Artificial Neural Network (ANN) that includes a communication bus interface, a memory, a controller, and an optical computing engine (OCE). The OCE is configured to execute an ANN model with ANN weights. Each ANN weight includes a quantized phase shift value θi and a phase shift value ϕi. The OCE includes a digital-to-optical (D/O) converter configured to generate input optical signals based on the input data, an optical neural network (ONN) configured to generate output optical signals based on the input optical signals, and an optical-to-digital (O/D) converter configured to generate the output data based on the output optical signals. The ONN includes a plurality of optical units (OUs), and each OU includes an optical multiply and accumulate (OMAC) module.

    Memory for an artificial neural network accelerator

    公开(公告)号:US11526305B2

    公开(公告)日:2022-12-13

    申请号:US17103629

    申请日:2020-11-24

    Applicant: Arm Limited

    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of read word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each read word selector has a plurality of input ports and an output port, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the read word selectors of the first bank and the second bank, and configured to select a combination of read word selectors from at least one of the first bank and the second bank based on a bank select signal.

    Pipelined accumulator
    7.
    发明授权

    公开(公告)号:US11501151B2

    公开(公告)日:2022-11-15

    申请号:US16885704

    申请日:2020-05-28

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a pipelined accumulator that includes a data selector configured to receive a sequence of operands to be summed, an input register coupled to the data selector, an output register, coupled to the data selector, configured to store a sequence of partial sums and output a final sum, and a multi-stage add module coupled to the input register and the output register. The multi-stage add module is configured to store a sequence of partial sums and a final sum in a redundant format, and perform back-to-back accumulation into the output register.

    Matrix multiplication system, apparatus and method

    公开(公告)号:US11194549B2

    公开(公告)日:2021-12-07

    申请号:US16663887

    申请日:2019-10-25

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the ith row of the first matrix with a corresponding column vector formed from the jth column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.

    Matrix Multiplication System and Method

    公开(公告)号:US20210097130A1

    公开(公告)日:2021-04-01

    申请号:US16585265

    申请日:2019-09-27

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.

    Memory for an artificial neural network accelerator

    公开(公告)号:US12086453B2

    公开(公告)日:2024-09-10

    申请号:US17103632

    申请日:2020-11-24

    Applicant: Arm Limited

    CPC classification number: G06F3/0655 G06F3/0604 G06F3/0679 G06N3/063 G11C11/54

    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of write word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each write word selector has an input port and a plurality of output ports, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the write word selectors of the first bank and the second bank, and configured to select a combination of write word selectors from at least one of the first bank and the second bank based on a bank select signal.

Patent Agency Ranking