Mixed-precision computation unit
    31.
    发明授权

    公开(公告)号:US11561767B2

    公开(公告)日:2023-01-24

    申请号:US16836117

    申请日:2020-03-31

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

    Memory for an artificial neural network accelerator

    公开(公告)号:US11526305B2

    公开(公告)日:2022-12-13

    申请号:US17103629

    申请日:2020-11-24

    Applicant: Arm Limited

    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of read word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each read word selector has a plurality of input ports and an output port, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the read word selectors of the first bank and the second bank, and configured to select a combination of read word selectors from at least one of the first bank and the second bank based on a bank select signal.

    Non-Volatile Memory Accelerator for Artificial Neural Networks

    公开(公告)号:US20220101085A1

    公开(公告)日:2022-03-31

    申请号:US17036490

    申请日:2020-09-29

    Applicant: Arm Limited

    Abstract: A non-volatile memory (NVM) crossbar for an artificial neural network (ANN) accelerator is provided. The NVM crossbar includes row signal lines configured to receive input analog voltage signals, multiply-and-accumulate (MAC) column signal lines, a correction column signal line, a MAC cell disposed at each row signal line and MAC column signal line intersection, and a correction cell disposed at each row signal line and correction column signal line intersection. Each MAC cell includes one or more programmable NVM elements programmed to an ANN unipolar weight, and each correction cell includes one or more programmable NVM elements. Each MAC column signal line generates a MAC signal based on the input analog voltage signals and the respective MAC cells, and the correction column signal line generates a correction signal based on the input analog voltage signals and the correction cells. Each MAC signal is corrected based on the correction signal.

    Systolic convolutional neural network

    公开(公告)号:US11188814B2

    公开(公告)日:2021-11-30

    申请号:US15945952

    申请日:2018-04-05

    Applicant: Arm Limited

    Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

    PROCESSOR FOR SPARSE MATRIX COMPUTATION
    35.
    发明申请

    公开(公告)号:US20200326938A1

    公开(公告)日:2020-10-15

    申请号:US16381349

    申请日:2019-04-11

    Applicant: Arm Limited

    Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.

    Clock frequency reduction for an electronic device

    公开(公告)号:US10579126B2

    公开(公告)日:2020-03-03

    申请号:US15308658

    申请日:2015-03-13

    Applicant: ARM LIMITED

    Abstract: An electronic device (20) has a clock path (24) for propagating a clock signal and a clock propagating element (26) on the clock path. An analogue element (30) coupled to the clock path (24) varies, in dependence on an analogue level of a first signal (32), a switching delay for the clock propagating element (26) to trigger a transition of the clock signal. The first signal is a digitally sampled signal. This provides a mechanism for providing a fast reduction in clock frequency even if the first signal is a metastable signal, which is useful for avoiding errors causes by voltage drops.

    Circuit delay monitoring apparatus and method
    37.
    发明授权
    Circuit delay monitoring apparatus and method 有权
    电路延迟监测装置及方法

    公开(公告)号:US09432009B2

    公开(公告)日:2016-08-30

    申请号:US14081900

    申请日:2013-11-15

    Applicant: ARM Limited

    CPC classification number: H03K5/135

    Abstract: A circuit delay monitoring apparatus has a ring oscillator with a plurality of delay elements, a signal transition being propagated through the delay elements of the ring oscillator, and a plurality N of sampling points being distributed around the ring oscillator. Selection circuitry selects, in dependence on the indication of the current location of the signal transition generated by the fine sampling circuitry, one of the M transition counter circuits whose associated location is greater than said predetermined amount from the current location of the signal transition. Output generation circuitry then generates a count indication for a reference time period dependent on a sampled count value of the transition counter circuit selected by the selection circuitry, the indication of the current location of the signal transition within the ring oscillator, and reference count data relating to the start of the reference time period.

    Abstract translation: 电路延迟监视装置具有环形振荡器,具有多个延迟元件,信号转换通过环形振荡器的延迟元件传播,并且多个采样点分布在环形振荡器周围。 选择电路根据由精细采样电路产生的信号转换的当前位置的指示来选择M个转换计数器电路中的一个,其相关位置大于来自信号转换的当前位置的所述预定量。 输出产生电路然后根据由选择电路选择的转换计数器电路的采样计数值,环形振荡器内的信号转换的当前位置的指示以及相关的参考计数数据,生成参考时间段的计数指示 到参考时间段的开始。

    Mixed-signal artificial neural network accelerator

    公开(公告)号:US12093808B2

    公开(公告)日:2024-09-17

    申请号:US17116623

    申请日:2020-12-09

    Applicant: Arm Limited

    CPC classification number: G06N3/063 G06F7/5443 G06F17/15 G06F2207/4824

    Abstract: An artificial neural network (ANN) accelerator is provided. The ANN accelerator includes digital controlled oscillators (DCOs), digital-to-time converters (DTCs) and a mixed-signal multiply-and-accumulate (MAC) array. Each DCO generates a first analog operand signal based on a first digital data value, and transmits the first analog operand signal along a respective column signal line. Each DTC generates a second analog operand signal based on a second digital data value, and transmits the second analog operand signal along a respective row signal line. The mixed-signal MAC array is coupled to the row and column signal lines, and includes mixed-signal MAC units. Each mixed-signal MAC unit includes an integrated clock gate (ICG) that generates a digital product signal based on the first and second analog operand signals, and a counter circuit that increments or decrements a count value stored in a register based on the digital product signal.

    Memory for an artificial neural network accelerator

    公开(公告)号:US12086453B2

    公开(公告)日:2024-09-10

    申请号:US17103632

    申请日:2020-11-24

    Applicant: Arm Limited

    CPC classification number: G06F3/0655 G06F3/0604 G06F3/0679 G06N3/063 G11C11/54

    Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of write word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each write word selector has an input port and a plurality of output ports, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the write word selectors of the first bank and the second bank, and configured to select a combination of write word selectors from at least one of the first bank and the second bank based on a bank select signal.

    Hardware accelerator for IM2COL operation

    公开(公告)号:US11783163B2

    公开(公告)日:2023-10-10

    申请号:US16901542

    申请日:2020-06-15

    Applicant: Arm Limited

    CPC classification number: G06N3/04 G06F9/30105 G06F17/16 G06N3/08

    Abstract: The present disclosure advantageously provides a matrix expansion unit that includes an input data selector, a first register set, a second register set, and an output data selector. The input data selector is configured to receive first matrix data in a columnwise format. The first register set is coupled to the input data selector, and includes a plurality of data selectors and a plurality of registers arranged in a first shift loop. The second register set is coupled to the data selector, and includes a plurality of data selectors and a plurality of registers arranged in a second shift loop. The output data selector is coupled to the first register set and the second register set, and is configured to output second matrix data in a rowwise format.

Patent Agency Ranking