Data processing method and apparatus for prefetching
    2.
    发明授权
    Data processing method and apparatus for prefetching 有权
    预取数据处理方法和装置

    公开(公告)号:US09037835B1

    公开(公告)日:2015-05-19

    申请号:US14061842

    申请日:2013-10-24

    Applicant: ARM LIMITED

    Abstract: A data processing device includes processing circuitry 20 for executing a first memory access instruction to a first address of a memory device 40 and a second memory access instruction to a second address of the memory device 40, the first address being different from the second address. The data processing device also includes prefetching circuitry 30 for prefetching data from the memory device 40 based on a stride length 70 and instruction analysis circuitry 50 for determining a difference between the first address and the second address. Stride refining circuitry 60 is also provided to refine the stride length based on factors of the stride length and factors of the difference calculated by the instruction analysis circuitry 50.

    Abstract translation: 数据处理设备包括处理电路20,用于执行对存储器件40的第一地址的第一存储器访问指令和到存储器件40的第二地址的第二存储器访问指令,第一地址不同于第二地址。 数据处理装置还包括预取电路30,用于基于步幅长度70和指令分析电路50从存储器装置40预取数据,用于确定第一地址和第二地址之间的差异。 还提供跨步精炼电路60以基于步幅长度的因素和由指令分析电路50计算的差异的因素来细化步幅长度。

    ADAPTIVE PREFETCHING IN A DATA PROCESSING APPARATUS
    3.
    发明申请
    ADAPTIVE PREFETCHING IN A DATA PROCESSING APPARATUS 审中-公开
    数据处理设备中的自适应预处理

    公开(公告)号:US20150134933A1

    公开(公告)日:2015-05-14

    申请号:US14080139

    申请日:2013-11-14

    Applicant: ARM Limited

    Abstract: A data processing apparatus and method of data processing are disclosed. An instruction execution unit executes a sequence of program instructions, wherein execution of at least some of the program instructions initiates memory access requests to retrieve data values from a memory. A prefetch unit prefetches data values from the memory for storage in a cache unit before they are requested by the instruction execution unit. The prefetch unit is configured to perform a miss response comprising increasing a number of the future data values which it prefetches, when a memory access request specifies a pending data value which is already subject to prefetching but is not yet stored in the cache unit. The prefetch unit is also configured, in response to an inhibition condition being met, to temporarily inhibit the miss response for an inhibition period.

    Abstract translation: 公开了一种数据处理装置和数据处理方法。 指令执行单元执行程序指令序列,其中至少一些程序指令的执行启动存储器访问请求以从存储器检索数据值。 在由指令执行单元请求之前,预取单元从存储器预取数据值以存储在高速缓存单元中。 预取单元被配置为当存储器访问请求指定已经经历预取但尚未存储在高速缓存单元中的未决数据值时,执行未命中的响应,包括增加其预取的未来数据值的数量。 响应于满足禁止条件,预取单元还被配置为暂时抑制禁止期间的未命中响应。

    Hybrid Memory Artificial Neural Network Hardware Accelerator

    公开(公告)号:US20210295137A1

    公开(公告)日:2021-09-23

    申请号:US16822640

    申请日:2020-03-18

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a hybrid memory artificial neural network hardware accelerator that includes a communication bus interface, a static memory, a non-refreshed dynamic memory, a controller and a computing engine. The static memory stores at least a portion of an ANN model. The ANN model includes an input layer, one or more hidden layers and an output layer, ANN basis weights, input data and output data. The non-refreshed dynamic memory is configured to store ANN custom weights for the input, hidden and output layers, and output data. For each layer or layer portion, the computing engine generates the ANN custom weights based on the ANN basis weights, stores the ANN custom weights in the non-refreshed dynamic memory, executes the layer or layer portion, based on inputs and the ANN custom weights, to generate layer output data, and stores the layer output data.

    Skip predictor for pre-trained recurrent neural networks

    公开(公告)号:US11663814B2

    公开(公告)日:2023-05-30

    申请号:US16855681

    申请日:2020-04-22

    Applicant: Arm Limited

    CPC classification number: G06N3/082 G06F17/18 G06K9/6267 G06N3/0472

    Abstract: The present disclosure advantageously provides a system and a method for skipping recurrent neural network (RNN) state updates using a skip predictor. Sequential input data are received and divided into sequences of input data values, each input data value being associated with a different time step for a pre-trained RNN model. At each time step, the hidden state vector for a prior time step is received from the pre-trained RNN model, and a determination, based on the input data value and the hidden state vector for at least one prior time step, is made whether to provide or not provide the input data value associated with the time step to the pre-trained RNN model for processing. When the input data value is not provided, the pre-trained RNN model does not update its hidden state vector. Importantly, the skip predictor is trained without retraining the pre-trained RNN model.

    Hardware accelerator for natural language processing applications

    公开(公告)号:US11507841B2

    公开(公告)日:2022-11-22

    申请号:US16786096

    申请日:2020-02-10

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.

    Data processing device and method for interleaved storage of data elements
    8.
    发明授权
    Data processing device and method for interleaved storage of data elements 有权
    数据处理装置和数据元素交错存储方法

    公开(公告)号:US09582419B2

    公开(公告)日:2017-02-28

    申请号:US14063161

    申请日:2013-10-25

    Applicant: ARM LIMITED

    Abstract: A data processing device 100 comprises a plurality of storage circuits 130, 160, which store a plurality of data elements of the bits in an interleaved manner. Data processing device also comprises a consumer 110 with a number of lanes 120. The consumer is able to individually access each of the plurality of storage circuits 130, 160 in order to receive into the lanes 120 either a subset of the plurality of data elements or y bits of each of the plurality of data elements. The consumer 110 is also able to execute a common instruction of each of the plurality of lanes 120. The relationship of the bits is such that b is greater than y and is an integer multiple of y. Each of the plurality of storage circuits 130, 160 stores at most y bits of each of the data elements. Furthermore, each of the storage circuits 130, 160 stores at most y/b of the plurality of data elements. By carrying out the interleaving in this manner, the plurality of storage circuits 130, 160 comprise no more than b/y storage circuits.

    Abstract translation: 数据处理设备100包括多个存储电路130,160,其以交错的方式存储位的多个数据元素。 数据处理设备还包括具有多个通道120的消费者110.消费者能够单独访问多个存储电路130,160中的每一个,以便接收多个数据元素的子集中的子集120或者, y比特的多个数据元素。 消费者110还能够执行多个通道120中的每一个的公共指令。比特的关系使得b大于y并且是y的整数倍。 多个存储电路130,160中的每一个存储每个数据元素的最多y位。 此外,存储电路130,160中的每一个存储多个数据元素中的至多y / b。 通过以这种方式进行交织,多个存储电路130,160包括不超过b / y存储电路。

    Hybrid memory artificial neural network hardware accelerator

    公开(公告)号:US11468305B2

    公开(公告)日:2022-10-11

    申请号:US16822640

    申请日:2020-03-18

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a hybrid memory artificial neural network hardware accelerator that includes a communication bus interface, a static memory, a non-refreshed dynamic memory, a controller and a computing engine. The static memory stores at least a portion of an ANN model. The ANN model includes an input layer, one or more hidden layers and an output layer, ANN basis weights, input data and output data. The non-refreshed dynamic memory is configured to store ANN custom weights for the input, hidden and output layers, and output data. For each layer or layer portion, the computing engine generates the ANN custom weights based on the ANN basis weights, stores the ANN custom weights in the non-refreshed dynamic memory, executes the layer or layer portion, based on inputs and the ANN custom weights, to generate layer output data, and stores the layer output data.

    Hardware Accelerator for Natural Language Processing Applications

    公开(公告)号:US20210248008A1

    公开(公告)日:2021-08-12

    申请号:US16786096

    申请日:2020-02-10

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.

Patent Agency Ranking