DATA PROCESSOR
    22.
    发明申请

    公开(公告)号:US20250165292A1

    公开(公告)日:2025-05-22

    申请号:US18512615

    申请日:2023-11-17

    Applicant: Arm Limited

    Abstract: The present disclosure relates to a data processor for processing data, comprising: a plurality of execution units to execute one or more operations; and a plurality of storage elements to store data for the one or more operations, the data processor being configured to process at least one task, each task to be executed in the form of a directed acyclic graph of operations, wherein each of the operations maps to a corresponding execution unit and each connection between operations in the acyclic graph maps to a corresponding storage element, the data processor further comprising: a plurality of counters; and a control module to control the plurality of counters to: in a first mode, count an operation cycle number associated with each operation of the at least one task, the operation cycle number of an operation being a number of cycles required to complete the operation; and in a second mode, count a unit cycle number associated with one or more execution units, the unit cycle number of an execution unit being an accumulative number of cycles when the execution unit is occupied in use during execution of the at least one task.

    DATA STORAGE
    23.
    发明公开
    DATA STORAGE 审中-公开

    公开(公告)号:US20240231661A9

    公开(公告)日:2024-07-11

    申请号:US18485419

    申请日:2023-10-12

    Applicant: Arm Limited

    CPC classification number: G06F3/064 G06F3/0604 G06F3/0659 G06F3/0673

    Abstract: A processor to obtain mapping data indicative of at least one mapping parameter for a plurality of mapping blocks of a multi-dimensional tensor to be mapped. The at least one mapping parameter is for mapping corresponding elements of each mapping block to the same co-ordinate in at least one selected dimension of the multi-dimensional tensor, such that each mapping block corresponds to the same set of co-ordinates in the at least one selected dimension. A co-ordinate of an element of a block of the multi-dimensional tensor is determined. The element is comprised by a mapping block. A physical address in a storage corresponding to the co-ordinate is determined, based on the co-ordinate. The physical address is utilized in a process comprising an interaction between the block of the multi-dimensional tensor and the storage.

    DATA STORAGE
    24.
    发明公开
    DATA STORAGE 审中-公开

    公开(公告)号:US20240134553A1

    公开(公告)日:2024-04-25

    申请号:US18485419

    申请日:2023-10-11

    Applicant: Arm Limited

    CPC classification number: G06F3/064 G06F3/0604 G06F3/0659 G06F3/0673

    Abstract: A processor to obtain mapping data indicative of at least one mapping parameter for a plurality of mapping blocks of a multi-dimensional tensor to be mapped. The at least one mapping parameter is for mapping corresponding elements of each mapping block to the same co-ordinate in at least one selected dimension of the multi-dimensional tensor, such that each mapping block corresponds to the same set of co-ordinates in the at least one selected dimension. A co-ordinate of an element of a block of the multi-dimensional tensor is determined. The element is comprised by a mapping block. A physical address in a storage corresponding to the co-ordinate is determined, based on the co-ordinate. The physical address is utilized in a process comprising an interaction between the block of the multi-dimensional tensor and the storage.

    Processor instruction specifying indexed storage region holding control data for swizzle operation

    公开(公告)号:US11188331B2

    公开(公告)日:2021-11-30

    申请号:US16576505

    申请日:2019-09-19

    Abstract: A data processing system includes: a processor; a data interface for communication with a control unit, the processor being on one side of the data interface; internal storage accessible by the processor, the internal storage being on the same side of the data interface as the processor; and a register array accessible by the processor and comprising a plurality of registers, each register having a plurality of vector lanes. The storage is arranged to store control data indicating an ordered selection of vector lanes of one or more of the registers. The processor is arranged to, in response to receiving instruction data from a control unit, perform a swizzle operation in which data is selected from one or more source registers in the register array, and transferred to a destination register. The data is selected from vector lanes in accordance with control data stored in the internal storage.

    NEURAL NETWORK PROCESSING
    27.
    发明申请

    公开(公告)号:US20210295140A1

    公开(公告)日:2021-09-23

    申请号:US16826586

    申请日:2020-03-23

    Applicant: Arm Limited

    Abstract: A neural network processor is disclosed that includes a combined convolution and pooling circuit that can perform both convolution and pooling operations. The circuit can perform a convolution operation by a multiply circuit determining products of corresponding input feature map and convolution kernel weight values, and an add circuit accumulating the products determined by the multiply circuit in storage. The circuit can perform an average pooling operation by the add circuit accumulating input feature map data values in the storage, a divisor circuit determining a divisor value, and a division circuit dividing the data value accumulated in the storage by the determined divisor value. The circuit can perform a maximum pooling operation by a maximum circuit determining a maximum value of input feature map data values, and storing the determined maximum value in the storage.

    Apparatus and method for executing a plurality of threads

    公开(公告)号:US10908916B2

    公开(公告)日:2021-02-02

    申请号:US15058389

    申请日:2016-03-02

    Applicant: ARM LIMITED

    Abstract: An apparatus and method are provided for executing a plurality of threads. The apparatus has processing circuitry arranged to execute the plurality of threads, with each thread executing a program to perform processing operations on thread data. Each thread has a thread identifier, and the thread data includes a value which is dependent on the thread identifier. Value generator circuitry is provided to perform a computation using the thread identifier of a chosen thread in order to generate the above mentioned value for that chosen thread, and to make that value available to the processing circuitry for use by the processing circuitry when executing the chosen thread. Such an arrangement can give rise to significant performance benefits when executing the plurality of threads on the apparatus.

    Decoding a complex program instruction corresponding to multiple micro-operations

    公开(公告)号:US09934037B2

    公开(公告)日:2018-04-03

    申请号:US14466183

    申请日:2014-08-22

    Applicant: ARM Limited

    Inventor: Rune Holm

    Abstract: A data processing apparatus 2 has processing circuitry 4 which can process multiple parallel threads of processing. A shared instruction decoder 30 decodes program instructions to generate micro-operations to be processed by the processing circuitry 4. The instructions include at least one complex instruction which has multiple micro-operations. Multiple fetch units 8 are provided for fetching the micro-operations generated by the decoder 30 for processing by the processing circuitry 4. Each fetch unit 8 is associated with at least one of the threads. The decoder 30 generates the micro-operations of a complex instruction individually in response to separate decode requests 24 triggered by a fetch unit 8, each decode request 24 identifying which micro-operation of the complex instruction is to be generated by the decoder 30 in response to the decode request 24.

    Data processing device and method for interleaved storage of data elements
    30.
    发明授权
    Data processing device and method for interleaved storage of data elements 有权
    数据处理装置和数据元素交错存储方法

    公开(公告)号:US09582419B2

    公开(公告)日:2017-02-28

    申请号:US14063161

    申请日:2013-10-25

    Applicant: ARM LIMITED

    Abstract: A data processing device 100 comprises a plurality of storage circuits 130, 160, which store a plurality of data elements of the bits in an interleaved manner. Data processing device also comprises a consumer 110 with a number of lanes 120. The consumer is able to individually access each of the plurality of storage circuits 130, 160 in order to receive into the lanes 120 either a subset of the plurality of data elements or y bits of each of the plurality of data elements. The consumer 110 is also able to execute a common instruction of each of the plurality of lanes 120. The relationship of the bits is such that b is greater than y and is an integer multiple of y. Each of the plurality of storage circuits 130, 160 stores at most y bits of each of the data elements. Furthermore, each of the storage circuits 130, 160 stores at most y/b of the plurality of data elements. By carrying out the interleaving in this manner, the plurality of storage circuits 130, 160 comprise no more than b/y storage circuits.

    Abstract translation: 数据处理设备100包括多个存储电路130,160,其以交错的方式存储位的多个数据元素。 数据处理设备还包括具有多个通道120的消费者110.消费者能够单独访问多个存储电路130,160中的每一个,以便接收多个数据元素的子集中的子集120或者, y比特的多个数据元素。 消费者110还能够执行多个通道120中的每一个的公共指令。比特的关系使得b大于y并且是y的整数倍。 多个存储电路130,160中的每一个存储每个数据元素的最多y位。 此外,存储电路130,160中的每一个存储多个数据元素中的至多y / b。 通过以这种方式进行交织,多个存储电路130,160包括不超过b / y存储电路。

Patent Agency Ranking