-
公开(公告)号:US11586890B2
公开(公告)日:2023-02-21
申请号:US16720380
申请日:2019-12-19
Applicant: Arm Limited
Inventor: Paul Nicholas Whatmough , Chuteng Zhou
Abstract: The present disclosure advantageously provides a hardware accelerator for an artificial neural network (ANN), including a communication bus interface, a memory, a controller, and at least one processing engine (PE). The communication bus interface is configured to receive a plurality of finetuned weights associated with the ANN, receive input data, and transmit output data. The memory is configured to store the plurality of finetuned weights, the input data and the output data. The PE is configured to receive the input data, execute an ANN model using a plurality of fixed weights associated with the ANN and the plurality of finetuned weights, and generate the output data. Each finetuned weight corresponds to a fixed weight.
-
公开(公告)号:US20230026113A1
公开(公告)日:2023-01-26
申请号:US17382108
申请日:2021-07-21
Applicant: Arm Limited
Inventor: Paul Nicholas Whatmough , Zhi-Gang Liu , Matthew Mattina
Abstract: Example methods, devices and/or circuits to be implemented in a processing device to perform neural network-based computing operations. According to an embodiment, an accumulation of weighted activation input values may be computed on accumulation cycles at least in part by multiplying and/or scaling accumulated activation input values by an associated neural network weight.
-
公开(公告)号:US20220351032A1
公开(公告)日:2022-11-03
申请号:US17242721
申请日:2021-04-28
Applicant: Arm Limited
Inventor: Teyuh Alice Chou , Mudit Bhargava , Supreet Jeloka , Fernando Garcia Redondo , Paul Nicholas Whatmough
Abstract: A compute-in-memory (CIM) array module and a method for performing dynamic saturation detection for a CIM array are provided. The CIM array module includes a CIM array, saturation detection units (SDUs) and a controller. The CIM array includes selectable row signal lines, column signal lines and cells. Each cell is located at an intersection of a selectable row signal line and a column signal line, and each cell has a programmable conductance. The SDUs are selectively coupled to at least one column signal line, and each SDU is configured to, for each column signal line, generate an analog signal, and identify the column signal line as a saturated column signal line when a voltage of the analog signal is greater than a saturation threshold voltage, or a current of the analog signal is greater than a saturation threshold current.
-
公开(公告)号:US20220164137A1
公开(公告)日:2022-05-26
申请号:US17103629
申请日:2020-11-24
Applicant: Arm Limited
Inventor: Mudit Bhargava , Paul Nicholas Whatmough , Supreet Jeloka , Zhi-Gang Liu
Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of read word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each read word selector has a plurality of input ports and an output port, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the read word selectors of the first bank and the second bank, and configured to select a combination of read word selectors from at least one of the first bank and the second bank based on a bank select signal.
-
公开(公告)号:US20220035890A1
公开(公告)日:2022-02-03
申请号:US17103676
申请日:2020-11-24
Applicant: Arm Limited
Inventor: Zhi-Gang Liu , Paul Nicholas Whatmough , Matthew Mattina
Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.
-
公开(公告)号:US20210124560A1
公开(公告)日:2021-04-29
申请号:US16663887
申请日:2019-10-25
Applicant: Arm Limited
Inventor: Zhi-Gang LIU , Paul Nicholas Whatmough
Abstract: The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the ith row of the first matrix with a corresponding column vector formed from the jth column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.
-
公开(公告)号:US20210089889A1
公开(公告)日:2021-03-25
申请号:US16836117
申请日:2020-03-31
Applicant: Arm Limited
Inventor: Dibakar Gope , Jesse Garrett Beu , Paul Nicholas Whatmough , Matthew Mattina
IPC: G06N3/08
Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.
-
公开(公告)号:US10747845B2
公开(公告)日:2020-08-18
申请号:US16118818
申请日:2018-08-31
Applicant: Arm Limited
Inventor: Paul Nicholas Whatmough , Matthew Mattina , Zhigang Liu
Abstract: A system, apparatus and method for exposing input data operands and input weight operands to elements of a two-dimensional array so that two pairs of operands are exposed to each element of the array.
-
公开(公告)号:US10708600B2
公开(公告)日:2020-07-07
申请号:US15875464
申请日:2018-01-19
Applicant: Arm Limited
Inventor: Yuhao Zhu , Paul Nicholas Whatmough
IPC: H04N19/167 , H04N19/43 , H04N21/2343 , G06N3/02 , H04N21/234
Abstract: A method of processing a video is provided. The method includes detecting a region of interest in a detection frame of the video. The method includes estimating a motion of the region of interest between the detection frame and an estimation frame of the video subsequent to the detection frame. The estimating is based on tracking of a characteristic of the detected region of interest into at least one portion of the estimation frame. The method includes, based on the estimated motion, estimating a location of the region of interest in the estimation frame. An apparatus for processing a video is also provided. A related non-transitory computer-readable storage medium comprising a set of computer-readable instructions is also provided.
-
公开(公告)号:US20200082544A1
公开(公告)日:2020-03-12
申请号:US16127007
申请日:2018-09-10
Applicant: Arm Limited
Inventor: Yuhao Zhu , Paul Nicholas Whatmough
Abstract: A data processing apparatus detects motion between frames in a sequence of frames. The data processing apparatus then selects and/or tracks a region of interest in the sequence of frames based on the detected motion. An artificial neural network is then implemented to process image data for the selected region of interest in an attempt to classify an object within the region of interest. The data processing apparatus can provide an efficient way of performing computer vision processing.
-
-
-
-
-
-
-
-
-