-
公开(公告)号:US11561767B2
公开(公告)日:2023-01-24
申请号:US16836117
申请日:2020-03-31
Applicant: Arm Limited
Inventor: Dibakar Gope , Jesse Garrett Beu , Paul Nicholas Whatmough , Matthew Mattina
Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.
-
公开(公告)号:US11526743B2
公开(公告)日:2022-12-13
申请号:US16818302
申请日:2020-03-13
Applicant: Arm Limited
Inventor: Zhi-Gang Liu , Matthew Mattina , John Fremont Brown, III
Abstract: The present disclosure advantageously provides an Optical Hardware Accelerator (OHA) for an Artificial Neural Network (ANN) that includes a communication bus interface, a memory, a controller, and an optical computing engine (OCE). The OCE is configured to execute an ANN model with ANN weights. Each ANN weight includes a quantized phase shift value θi and a phase shift value ϕi. The OCE includes a digital-to-optical (D/O) converter configured to generate input optical signals based on the input data, an optical neural network (ONN) configured to generate output optical signals based on the input optical signals, and an optical-to-digital (O/D) converter configured to generate the output data based on the output optical signals. The ONN includes a plurality of optical units (OUs), and each OU includes an optical multiply and accumulate (OMAC) module.
-
公开(公告)号:US11188814B2
公开(公告)日:2021-11-30
申请号:US15945952
申请日:2018-04-05
Applicant: Arm Limited
Inventor: Paul Nicholas Whatmough , Ian Rudolf Bratt , Matthew Mattina
Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.
-
公开(公告)号:US20200326938A1
公开(公告)日:2020-10-15
申请号:US16381349
申请日:2019-04-11
Applicant: Arm Limited
Inventor: Zhigang Liu , Matthew Mattina , Paul Nicholas Whatmough , Jesse Garrett Beu
Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.
-
-
-