NEURAL PROCESSING SYSTEM
    1.
    发明申请

    公开(公告)号:US20210224629A1

    公开(公告)日:2021-07-22

    申请号:US16748375

    申请日:2020-01-21

    Applicant: Arm Limited

    Abstract: A computer-implemented method, performed in a neural processing system comprising control processor circuitry and arithmetic logic circuitry, of performing a convolution between an input feature map (IFM) and convolutional filter data, resulting in an output feature map (OFM). The method includes, obtaining in the control processor circuitry, dimensional characteristic parameters relating to dimensions of input work batch data arrays and positional characteristic parameters relating to positions of feature map content within the input work batches. The method also includes, in the arithmetic logic circuitry, performing convolutions between the input work batches, generated from the IFM based on the dimensional characteristic parameters and the positional characteristic parameters, and work batch filter data arrays corresponding to the filter to produce a plurality of output work batch data arrays. The plurality of output work batches are combined to generate an OFM.

    SHARED RESOURCES IN A DATA PROCESSING APPARATUS FOR EXECUTING A PLURALITY OF THREADS

    公开(公告)号:US20170286107A1

    公开(公告)日:2017-10-05

    申请号:US15505714

    申请日:2015-07-28

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus (100) executes threads and includes a general program counter (PC) (120) identifying an instruction to be executed for at least a subset of the threads. Each thread has a thread PC (184). The subset of threads has at least one lock parameter (188, 500-504) for tracking exclusive access to shared resources. In response to a first instruction executed for a thread, the processor (160) modifies the at least one lock parameter (188), (500-504) to indicate that the thread has gained exclusive access to the shared resource. In response to a second instruction, the processor modifies the at least one lock parameter (188, 500-504) to indicate that the thread no longer has exclusive access. A selector (110) selects one of the subset of threads based on the at least one lock parameter (188, 500-504) and sets the general PC (120) to the thread PC (184) of the selected thread.

    PROCESSOR, METHOD AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIA FOR HANDLING DATA

    公开(公告)号:US20240311947A1

    公开(公告)日:2024-09-19

    申请号:US18184212

    申请日:2023-03-15

    Applicant: Arm Limited

    CPC classification number: G06T1/20 G06T1/60

    Abstract: A processor, method and non-transitory computer-readable storage medium for handling data, by obtaining task data describing a task to be executed in the form of a plurality of operations on data, the task data further defining an operation space of said data, analyzing each of the operations to define transformation data comprising transformation instruction representing a transform into an associated operation-specific local spaces. In case transformation instructions to get to the operation-specific local space for an operation are producing less dimensions compared to the operation space, one or more operation-specific arguments are stored in a data field corresponding to a dimension not produced by the transformation instructions in the transformation data corresponding to the operation.

    EFFICIENT DATA PROCESSING, ARBITRATION AND PRIORITIZATION

    公开(公告)号:US20240248764A1

    公开(公告)日:2024-07-25

    申请号:US18316602

    申请日:2023-05-12

    Applicant: Arm Limited

    CPC classification number: G06F9/5038 G06F9/505 G06F2209/5021

    Abstract: A memory unit configured for handling task data, the task data describing a task to be executed as a directed acyclic graph of operations, wherein each operation maps to a corresponding execution unit, and wherein each connection between operations in the acyclic graph maps to a corresponding storage element of the execution unit. The task data defines an operation space representing the dimensions of a multi-dimensional arrangement of the connected operations to be executed represented by the data blocks; the memory unit configured to receive a sequence of processing requests comprising the one or more data blocks with each data block assigned a priority value and comprising a block command. The memory unit is configured to arbitrate between the data blocks based upon the priority value and block command to prioritize the sequence of processing requests and wherein the processing requests include writing data to, or reading data from storage.

    OPERATION DISTRIBUTION ACROSS MULTIPLE PROCESSING CORES

    公开(公告)号:US20240248721A1

    公开(公告)日:2024-07-25

    申请号:US18414230

    申请日:2024-01-16

    Applicant: Arm Limited

    CPC classification number: G06F9/3838

    Abstract: A method and apparatus for distributing operations for execution. Input data is received and is subdivided into portions, each comprising a first and second sub-portion. A first operation and a second operation are received. Dependencies between the first and second operations are identified. For each portion the first operation is issued for execution on the first sub-portion to produce a first output sub-portion, and completion is tracked. The first operation is issued for execution on the second sub-portion to produce a second output sub-portion. Depending upon satisfaction of the dependencies in respect of the first sub-portion, either the second operation to be executed on the first output sub-portion is issued, if the dependencies are met; or the second operation, to be executed on the first output sub-portion is stalled, if the dependencies are not met. This is repeated for each subsequent portion.

    NEURAL PROCESSING SYSTEM
    7.
    发明申请

    公开(公告)号:US20210224630A1

    公开(公告)日:2021-07-22

    申请号:US16797210

    申请日:2020-02-21

    Applicant: Arm Limited

    Abstract: A computer-implemented method, performed in a neural processing system comprising control processor circuitry and arithmetic logic circuitry, of performing a convolution between an input feature map (IFM) and convolutional filter data, resulting in an output feature map (OFM). The method includes, obtaining in the control processor circuitry, dimensional characteristic parameters relating to dimensions of input work batch data arrays and positional characteristic parameters relating to positions of feature map content within the input work batches. The method also includes, in the arithmetic logic circuitry, performing convolutions between the input work batches, generated from the IFM based on the dimensional characteristic parameters and the positional characteristic parameters, and work batch filter data arrays corresponding to the filter to produce a plurality of output work batch data arrays. The plurality of output work batches are combined to generate an OFM.

    DATA PROCESSING DEVICE AND METHOD FOR INTERLEAVED STORAGE OF DATA ELEMENTS
    8.
    发明申请
    DATA PROCESSING DEVICE AND METHOD FOR INTERLEAVED STORAGE OF DATA ELEMENTS 有权
    数据处理装置和用于数据元素的存储的方法

    公开(公告)号:US20150121019A1

    公开(公告)日:2015-04-30

    申请号:US14063161

    申请日:2013-10-25

    Applicant: Arm Limited

    Abstract: A data processing device 100 comprises a plurality of storage circuits 130, 160, which store a plurality of data elements of the bits in an interleaved manner. Data processing device also comprises a consumer 110 with a number of lanes 120. The consumer is able to individually access each of the plurality of storage circuits 130, 160 in order to receive into the lanes 120 either a subset of the plurality of data elements or y bits of each of the plurality of data elements. The consumer 110 is also able to execute a common instruction of each of the plurality of lanes 120. The relationship of the bits is such that b is greater than y and is an integer multiple of y. Each of the plurality of storage circuits 130, 160 stores at most y bits of each of the data elements. Furthermore, each of the storage circuits 130, 160 stores at most y/b of the plurality of data elements. By carrying out the interleaving in this manner, the plurality of storage circuits 130, 160 comprise no more than b/y storage circuits.

    Abstract translation: 数据处理设备100包括多个存储电路130,160,其以交错的方式存储位的多个数据元素。 数据处理设备还包括具有多个通道120的消费者110.消费者能够单独访问多个存储电路130,160中的每一个,以便接收多个数据元素的子集中的子集120或者, y比特的多个数据元素。 消费者110还能够执行多个通道120中的每一个的公共指令。比特的关系使得b大于y并且是y的整数倍。 多个存储电路130,160中的每一个存储每个数据元素的最多y位。 此外,存储电路130,160中的每一个存储多个数据元素中的至多y / b。 通过以这种方式进行交织,多个存储电路130,160包括不超过b / y存储电路。

    TRACKING BUFFER REDUCTION AND REUSE IN A PROCESSOR

    公开(公告)号:US20240248755A1

    公开(公告)日:2024-07-25

    申请号:US18099595

    申请日:2023-01-20

    Applicant: Arm Limited

    CPC classification number: G06F9/4881 G06F9/3555

    Abstract: A processor comprising: a handling unit; a plurality of components each configured to execute a function. The handling unit can receive a task comprising operations on data in a coordinate space having N dimensions, receive a data structure describing execution of the task and comprising a partially ordered set of data items each associated with instructions usable by the plurality of components when executing the task, each data item is associated with a component among the plurality of components, each data item indicates dimensions of the coordinates space for which changes of coordinate causes the function of the associated component to execute, and dimensions of the coordinate space for which changes of coordinate causes the function of the associated component to store data ready to be used by another component. The handling unit iterates over the coordinate space and executes the task using the partially ordered set of data items.

    READING DATA WITHIN A COMPRESSED DATA STREAM
    10.
    发明公开

    公开(公告)号:US20240248754A1

    公开(公告)日:2024-07-25

    申请号:US18099594

    申请日:2023-01-20

    Applicant: Arm Limited

    CPC classification number: G06F9/4881

    Abstract: A processor to generate position data indicative of a position within a compressed data stream, wherein, previously, in executing a task, data of the compressed data stream ending at the position has been read by the processor from storage storing the compressed data stream. After reading the data, the processor reads further data of the compressed data stream from the storage, in executing the task, the further data located beyond the position within the compressed data stream. After reading the further data, the processor reads, based on the position data, a portion of the compressed data stream from the storage, in executing the task, starting from the position within the compressed data stream. The processor decompresses the portion of the compressed data stream to generate decompressed data, in executing the task.

Patent Agency Ranking