IDENTIFICATION OF SUB-GRAPHS FROM A DIRECTED ACYCLIC GRAPH OF OPERATIONS ON INPUT DATA

    公开(公告)号:US20240370301A1

    公开(公告)日:2024-11-07

    申请号:US18640250

    申请日:2024-04-19

    Applicant: Arm Limited

    Abstract: The present disclosure relates to a system, method and non-transitory computer-readable storage medium for handling data. From a directed acyclic graph, DAG, of operations on input data a sub-graph of operations is identified and issued as task data to be executed by a processing module, wherein each of the operations in the sub-graph maps to a corresponding execution unit of the processing module of the system and wherein each connection between operations maps to a corresponding storage element of the processing module. The sub-graph is identified such that a simulation of an execution of the operations of the candidate sub-graph according to a determined size of the processing unit of said input data shows that the processing module can execute the operations of the sub-graph such that memory constrains of the processing module are met and read-write operations to memory external to the processing module are avoided or reduced.

    Neural net work processing
    3.
    发明授权

    公开(公告)号:US11537860B2

    公开(公告)日:2022-12-27

    申请号:US16826586

    申请日:2020-03-23

    Applicant: Arm Limited

    Abstract: A neural network processor is disclosed that includes a combined convolution and pooling circuit that can perform both convolution and pooling operations. The circuit can perform a convolution operation by a multiply circuit determining products of corresponding input feature map and convolution kernel weight values, and an add circuit accumulating the products determined by the multiply circuit in storage. The circuit can perform an average pooling operation by the add circuit accumulating input feature map data values in the storage, a divisor circuit determining a divisor value, and a division circuit dividing the data value accumulated in the storage by the determined divisor value. The circuit can perform a maximum pooling operation by a maximum circuit determining a maximum value of input feature map data values, and storing the determined maximum value in the storage.

    EXECUTING NEURAL NETWORKS ON ELECTRONIC DEVICES

    公开(公告)号:US20210133542A1

    公开(公告)日:2021-05-06

    申请号:US16670140

    申请日:2019-10-31

    Applicant: Arm Limited

    Abstract: When performing a matrix-vector multiply operation for neural network processing, a set of one or more input vectors to be multiplied by a matrix of data values is scanned to identify data positions of the input vector(s) for which the data value is non-zero in at least one of the input vectors. For each of the data positions identified as having a non-zero value in at least one of the input vectors, the set of data values from the matrix of data values for that data position is fetched from memory and the matrix-vector multiply operation is performed using the data values for the input vectors for the data positions identified as being non-zero and the fetched set(s) of data values from the matrix of data values for those data position(s).

    EFFICIENT TASK ALLOCATION
    5.
    发明公开

    公开(公告)号:US20240036919A1

    公开(公告)日:2024-02-01

    申请号:US18358995

    申请日:2023-07-26

    Applicant: Arm Limited

    CPC classification number: G06F9/4881 G06T1/20

    Abstract: A method and processor comprising a command processing unit to receive, from a host processor, a sequence of commands to be executed; and generate based on the sequence of commands a plurality of tasks. The processor also comprises a plurality of compute units each having a first processing module for executing tasks of a first task type, a second processing module for executing tasks of a second task type, different from the first task type, and a local cache shared by at least the first processing module and the second processing module. The command processing unit issues the plurality of tasks to at least one of the plurality of compute units, and wherein at least one of the plurality of compute units is to process at least one of the plurality of tasks.

    BROADCAST HUB FOR MULTI-PROCESSOR ARRANGEMENT

    公开(公告)号:US20230315677A1

    公开(公告)日:2023-10-05

    申请号:US17709255

    申请日:2022-03-30

    Applicant: Arm Limited

    CPC classification number: G06F15/80

    Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast hubs for multi-processor arrangements. A processing tile may comprise a broadcast hub to obtain a plurality of parameters applicable in a particular operation from at least one of a plurality of processing tiles and initiate distribution of the plurality of parameters to the plurality of processing tiles, wherein the plurality of processing tiles may execute the particular operation based at least in part on the plurality of distributed parameters.

    Register-based matrix multiplication with multiple matrices per register

    公开(公告)号:US11288066B2

    公开(公告)日:2022-03-29

    申请号:US16626701

    申请日:2018-06-08

    Applicant: ARM LIMITED

    Abstract: Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.

    NEURAL NETWORK PROCESSING
    8.
    发明申请

    公开(公告)号:US20220092409A1

    公开(公告)日:2022-03-24

    申请号:US17030176

    申请日:2020-09-23

    Applicant: Arm Limited

    Abstract: To perform neural network processing to modify an input data array to generate a corresponding output data array using a filter comprising an array of weight data, at least one of the input data array and the filter are subdivided into a plurality of portions, a plurality of neural network processing passes using the portions are performed, and the output generated by each processing pass is combined to provide the output data array.

Patent Agency Ranking