Hybrid filter banks for artificial neural networks

    公开(公告)号:US12067373B2

    公开(公告)日:2024-08-20

    申请号:US16836110

    申请日:2020-03-31

    Applicant: Arm Limited

    CPC classification number: G06F7/483 G06F7/5443 G06N3/04 G06N3/063 G06N3/08

    Abstract: The present disclosure advantageously provides a system including a memory, a processor, and a circuitry to execute one or more mixed precision layers of an artificial neural network (ANN), each mixed precision layer including high-precision weight filters and low precision weight filters. The circuitry is configured to perform one or more calculations on an input feature map having a plurality of input channels (cin) using the high precision weight filters to create a high precision output feature map having a first number of output channels (k), perform one or more calculations on the input feature map using the low precision weight filters to create a low precision output feature map having a second number of output channels (cout−k), and concatenate the high precision output feature map and the low precision output feature map to create a unified output feature map having a plurality of output channels (cout).

    Apparatus and method for providing coherence data for use when implementing a cache coherency protocol

    公开(公告)号:US11934307B2

    公开(公告)日:2024-03-19

    申请号:US17905566

    申请日:2021-01-18

    Applicant: Arm Limited

    CPC classification number: G06F12/0292 G06F12/0831 G06F12/0871

    Abstract: An apparatus and method are provided for receiving a request from a plurality of processing units, where multiple of those processing units have associated cache storage. A snoop unit is used to implement a cache coherency protocol when a request is received that identifies a cacheable memory address. The snoop unit has snoop filter storage comprising a plurality of snoop filter tables organized in a hierarchical arrangement. The snoop filter tables comprise a primary snoop filter table at a highest level in the hierarchy, and each snoop filter table at a lower level in the hierarchy forms a backup snoop filter table for an adjacent snoop filter table at a higher level in the hierarchy. Each snoop filter table is arranged as a multi-way set associative storage structure, and each backup snoop filter table has a different number of sets than are provided in the adjacent snoop filter table.

    Skip predictor for pre-trained recurrent neural networks

    公开(公告)号:US11663814B2

    公开(公告)日:2023-05-30

    申请号:US16855681

    申请日:2020-04-22

    Applicant: Arm Limited

    CPC classification number: G06N3/082 G06F17/18 G06K9/6267 G06N3/0472

    Abstract: The present disclosure advantageously provides a system and a method for skipping recurrent neural network (RNN) state updates using a skip predictor. Sequential input data are received and divided into sequences of input data values, each input data value being associated with a different time step for a pre-trained RNN model. At each time step, the hidden state vector for a prior time step is received from the pre-trained RNN model, and a determination, based on the input data value and the hidden state vector for at least one prior time step, is made whether to provide or not provide the input data value associated with the time step to the pre-trained RNN model for processing. When the input data value is not provided, the pre-trained RNN model does not update its hidden state vector. Importantly, the skip predictor is trained without retraining the pre-trained RNN model.

    Mixed-precision computation unit
    16.
    发明授权

    公开(公告)号:US11561767B2

    公开(公告)日:2023-01-24

    申请号:US16836117

    申请日:2020-03-31

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

    Counting elements in data items in a data processing apparatus

    公开(公告)号:US11042375B2

    公开(公告)日:2021-06-22

    申请号:US15665781

    申请日:2017-08-01

    Applicant: ARM Limited

    Abstract: An apparatus and method of operating the apparatus are provided for performing a count operation. Instruction decoder circuitry is responsive to a count instruction specifying an input data item to generate control signals to control the data processing circuitry to perform a count operation. The count operation determines a count value indicative of a number of input elements of a subset of elements in the specified input data item which have a value which matches a reference value in a reference element in a reference data item. A plurality of count operations may be performed to determine a count data item corresponding to the input data item. A register scatter storage instruction, a gather index generation instruction, and respective apparatuses responsive to them, as well as simulator implementations, are also provided.

    PROCESSOR FOR SPARSE MATRIX COMPUTATION
    19.
    发明申请

    公开(公告)号:US20200326938A1

    公开(公告)日:2020-10-15

    申请号:US16381349

    申请日:2019-04-11

    Applicant: Arm Limited

    Abstract: A data processor receives a first set of processor instructions for combining a first matrix with a second matrix to produce a third matrix and generates a second set of processor instructions therefrom by identifying values of non-zero elements of the first matrix stored in a memory of the data processor and determining memory locations of elements of the second matrix. An instruction of the second set of processor instructions includes a determined memory location and/or an explicit value of an identified non-zero element. The second set of processor instructions is executed by the data processor. The second set of processor instructions may be generated by just-in-time compilation of the first set of processor instructions and may include instructions of a custom instruction set architecture.

    Hints in a data processing apparatus

    公开(公告)号:US10572259B2

    公开(公告)日:2020-02-25

    申请号:US15876430

    申请日:2018-01-22

    Applicant: Arm Limited

    Abstract: An apparatus and method of operating a data processing apparatus are provided. The data processing circuitry is responsive to a hint instruction to then assert at least one performance modifying control signal, when subsequently generating control signals for other data processing instructions. This causes the data processing functional hardware which performs the data processing operations defined by the data processing instructions to operate in a modified manner, although the data processing results produced do not change in dependence on whether the at least one performance modifying control signal is asserted.

Patent Agency Ranking