Mixed-precision computation unit
    31.
    发明授权

    公开(公告)号:US11561767B2

    公开(公告)日:2023-01-24

    申请号:US16836117

    申请日:2020-03-31

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

    Artificial neural network optical hardware accelerator

    公开(公告)号:US11526743B2

    公开(公告)日:2022-12-13

    申请号:US16818302

    申请日:2020-03-13

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides an Optical Hardware Accelerator (OHA) for an Artificial Neural Network (ANN) that includes a communication bus interface, a memory, a controller, and an optical computing engine (OCE). The OCE is configured to execute an ANN model with ANN weights. Each ANN weight includes a quantized phase shift value θi and a phase shift value ϕi. The OCE includes a digital-to-optical (D/O) converter configured to generate input optical signals based on the input data, an optical neural network (ONN) configured to generate output optical signals based on the input optical signals, and an optical-to-digital (O/D) converter configured to generate the output data based on the output optical signals. The ONN includes a plurality of optical units (OUs), and each OU includes an optical multiply and accumulate (OMAC) module.

    Systolic convolutional neural network

    公开(公告)号:US11188814B2

    公开(公告)日:2021-11-30

    申请号:US15945952

    申请日:2018-04-05

    Applicant: Arm Limited

    Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

    PROCESSOR FOR SPARSE MATRIX COMPUTATION
    34.
    发明申请

    公开(公告)号:US20200326938A1

    公开(公告)日:2020-10-15

    申请号:US16381349

    申请日:2019-04-11

    Applicant: Arm Limited

    Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.

Patent Agency Ranking