-
公开(公告)号:US20250077841A1
公开(公告)日:2025-03-06
申请号:US18458800
申请日:2023-08-30
Applicant: Arm Limited
Inventor: Rune Holm , Anton Kachatkou , Benjamin Klimczak , Ruomei Yan , Diego Russo
Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to adapt a neural network structure to a target platform. One or more performance metrics of an execution of the neural network structure may be implemented by one or more target hardware elements. A module from a library of modules may be selected to replace one or more elements of the neural network structure based, at least in part, on the observed one or more performance metrics.
-
公开(公告)号:US20240370301A1
公开(公告)日:2024-11-07
申请号:US18640250
申请日:2024-04-19
Applicant: Arm Limited
Inventor: Elliot Maurice Simons Rosemarine , Rune Holm
IPC: G06F9/50
Abstract: The present disclosure relates to a system, method and non-transitory computer-readable storage medium for handling data. From a directed acyclic graph, DAG, of operations on input data a sub-graph of operations is identified and issued as task data to be executed by a processing module, wherein each of the operations in the sub-graph maps to a corresponding execution unit of the processing module of the system and wherein each connection between operations maps to a corresponding storage element of the processing module. The sub-graph is identified such that a simulation of an execution of the operations of the candidate sub-graph according to a determined size of the processing unit of said input data shows that the processing module can execute the operations of the sub-graph such that memory constrains of the processing module are met and read-write operations to memory external to the processing module are avoided or reduced.
-
公开(公告)号:US11537860B2
公开(公告)日:2022-12-27
申请号:US16826586
申请日:2020-03-23
Applicant: Arm Limited
Inventor: Rune Holm , John Wakefield Brothers, III
Abstract: A neural network processor is disclosed that includes a combined convolution and pooling circuit that can perform both convolution and pooling operations. The circuit can perform a convolution operation by a multiply circuit determining products of corresponding input feature map and convolution kernel weight values, and an add circuit accumulating the products determined by the multiply circuit in storage. The circuit can perform an average pooling operation by the add circuit accumulating input feature map data values in the storage, a divisor circuit determining a divisor value, and a division circuit dividing the data value accumulated in the storage by the determined divisor value. The circuit can perform a maximum pooling operation by a maximum circuit determining a maximum value of input feature map data values, and storing the determined maximum value in the storage.
-
公开(公告)号:US20210133542A1
公开(公告)日:2021-05-06
申请号:US16670140
申请日:2019-10-31
Applicant: Arm Limited
Inventor: Rune Holm , John Wakefield Brothers, III
Abstract: When performing a matrix-vector multiply operation for neural network processing, a set of one or more input vectors to be multiplied by a matrix of data values is scanned to identify data positions of the input vector(s) for which the data value is non-zero in at least one of the input vectors. For each of the data positions identified as having a non-zero value in at least one of the input vectors, the set of data values from the matrix of data values for that data position is fetched from memory and the matrix-vector multiply operation is performed using the data values for the input vectors for the data positions identified as being non-zero and the fetched set(s) of data values from the matrix of data values for those data position(s).
-
公开(公告)号:US20240036919A1
公开(公告)日:2024-02-01
申请号:US18358995
申请日:2023-07-26
Applicant: Arm Limited
Inventor: Alexander Eugene Chalfin , John Wakefield Brothers, III , Rune Holm , Samuel James Edward Martin
CPC classification number: G06F9/4881 , G06T1/20
Abstract: A method and processor comprising a command processing unit to receive, from a host processor, a sequence of commands to be executed; and generate based on the sequence of commands a plurality of tasks. The processor also comprises a plurality of compute units each having a first processing module for executing tasks of a first task type, a second processing module for executing tasks of a second task type, different from the first task type, and a local cache shared by at least the first processing module and the second processing module. The command processing unit issues the plurality of tasks to at least one of the plurality of compute units, and wherein at least one of the plurality of compute units is to process at least one of the plurality of tasks.
-
公开(公告)号:US20230315677A1
公开(公告)日:2023-10-05
申请号:US17709255
申请日:2022-03-30
Applicant: Arm Limited
Inventor: Erik Persson , Graeme Leslie Ingram , Rune Holm , John Wakefield Brothers, III
IPC: G06F15/80
CPC classification number: G06F15/80
Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast hubs for multi-processor arrangements. A processing tile may comprise a broadcast hub to obtain a plurality of parameters applicable in a particular operation from at least one of a plurality of processing tiles and initiate distribution of the plurality of parameters to the plurality of processing tiles, wherein the plurality of processing tiles may execute the particular operation based at least in part on the plurality of distributed parameters.
-
公开(公告)号:US11288066B2
公开(公告)日:2022-03-29
申请号:US16626701
申请日:2018-06-08
Applicant: ARM LIMITED
Inventor: David Hennah Mansell , Rune Holm , Ian Michael Caulfield , Jelena Milanovic
Abstract: Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.
-
公开(公告)号:US20220092409A1
公开(公告)日:2022-03-24
申请号:US17030176
申请日:2020-09-23
Applicant: Arm Limited
IPC: G06N3/08
Abstract: To perform neural network processing to modify an input data array to generate a corresponding output data array using a filter comprising an array of weight data, at least one of the input data array and the filter are subdivided into a plurality of portions, a plurality of neural network processing passes using the portions are performed, and the output generated by each processing pass is combined to provide the output data array.
-
公开(公告)号:US12001369B2
公开(公告)日:2024-06-04
申请号:US17709293
申请日:2022-03-30
Applicant: Arm Limited
Inventor: Erik Persson , Graeme Leslie Ingram , Rune Holm , John Wakefield Brothers, III
Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast regions for multi-processor arrangements.
-
公开(公告)号:US20230315670A1
公开(公告)日:2023-10-05
申请号:US17709293
申请日:2022-03-30
Applicant: Arm Limited
Inventor: Erik Persson , Graeme Leslie Ingram , Rune Holm , John Wakefield Brothers, III
Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast regions for multi-processor arrangements.
-
-
-
-
-
-
-
-
-