Patent search ap:("Arm Limited") AND inv:"Rune Holm" Page 1

1.

发明申请
SYSTEM, DEVICES AND/OR PROCESSES FOR ADAPTING NEURAL NETWORK TO EXECUTION HARDWARE 有权

公开(公告)号：US20250077841A1

公开(公告)日：2025-03-06

申请号：US18458800

申请日：2023-08-30

Applicant: Arm Limited

Inventor： Rune Holm , Anton Kachatkou , Benjamin Klimczak , Ruomei Yan , Diego Russo

IPC: G06N3/045 , G06N3/063 , G06N3/082

Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to adapt a neural network structure to a target platform. One or more performance metrics of an execution of the neural network structure may be implemented by one or more target hardware elements. A module from a library of modules may be selected to replace one or more elements of the neural network structure based, at least in part, on the observed one or more performance metrics.

2.

发明申请
IDENTIFICATION OF SUB-GRAPHS FROM A DIRECTED ACYCLIC GRAPH OF OPERATIONS ON INPUT DATA 有权

公开(公告)号：US20240370301A1

公开(公告)日：2024-11-07

申请号：US18640250

申请日：2024-04-19

Applicant: Arm Limited

Inventor： Elliot Maurice Simons Rosemarine , Rune Holm

IPC: G06F9/50

Abstract: The present disclosure relates to a system, method and non-transitory computer-readable storage medium for handling data. From a directed acyclic graph, DAG, of operations on input data a sub-graph of operations is identified and issued as task data to be executed by a processing module, wherein each of the operations in the sub-graph maps to a corresponding execution unit of the processing module of the system and wherein each connection between operations maps to a corresponding storage element of the processing module. The sub-graph is identified such that a simulation of an execution of the operations of the candidate sub-graph according to a determined size of the processing unit of said input data shows that the processing module can execute the operations of the sub-graph such that memory constrains of the processing module are met and read-write operations to memory external to the processing module are avoided or reduced.

3.

发明授权
Neural net work processing 有权

公开(公告)号：US11537860B2

公开(公告)日：2022-12-27

申请号：US16826586

申请日：2020-03-23

Applicant: Arm Limited

Inventor： Rune Holm , John Wakefield Brothers, III

IPC: G06N3/063 , G06F7/50 , G06F17/18 , G06F7/535 , G06F7/523

Abstract: A neural network processor is disclosed that includes a combined convolution and pooling circuit that can perform both convolution and pooling operations. The circuit can perform a convolution operation by a multiply circuit determining products of corresponding input feature map and convolution kernel weight values, and an add circuit accumulating the products determined by the multiply circuit in storage. The circuit can perform an average pooling operation by the add circuit accumulating input feature map data values in the storage, a divisor circuit determining a divisor value, and a division circuit dividing the data value accumulated in the storage by the determined divisor value. The circuit can perform a maximum pooling operation by a maximum circuit determining a maximum value of input feature map data values, and storing the determined maximum value in the storage.

4.

发明申请
EXECUTING NEURAL NETWORKS ON ELECTRONIC DEVICES 有权

公开(公告)号：US20210133542A1

公开(公告)日：2021-05-06

申请号：US16670140

申请日：2019-10-31

Applicant: Arm Limited

Inventor： Rune Holm , John Wakefield Brothers, III

IPC: G06N3/063 , G06N3/04 , G06F17/16 , G06F7/544

Abstract: When performing a matrix-vector multiply operation for neural network processing, a set of one or more input vectors to be multiplied by a matrix of data values is scanned to identify data positions of the input vector(s) for which the data value is non-zero in at least one of the input vectors. For each of the data positions identified as having a non-zero value in at least one of the input vectors, the set of data values from the matrix of data values for that data position is fetched from memory and the matrix-vector multiply operation is performed using the data values for the input vectors for the data positions identified as being non-zero and the fetched set(s) of data values from the matrix of data values for those data position(s).

5.

发明公开
EFFICIENT TASK ALLOCATION 审中-公开

公开(公告)号：US20240036919A1

公开(公告)日：2024-02-01

申请号：US18358995

申请日：2023-07-26

Applicant: Arm Limited

Inventor： Alexander Eugene Chalfin , John Wakefield Brothers, III , Rune Holm , Samuel James Edward Martin

IPC: G06F9/48 , G06T1/20

CPC classification number: G06F9/4881 , G06T1/20

Abstract: A method and processor comprising a command processing unit to receive, from a host processor, a sequence of commands to be executed; and generate based on the sequence of commands a plurality of tasks. The processor also comprises a plurality of compute units each having a first processing module for executing tasks of a first task type, a second processing module for executing tasks of a second task type, different from the first task type, and a local cache shared by at least the first processing module and the second processing module. The command processing unit issues the plurality of tasks to at least one of the plurality of compute units, and wherein at least one of the plurality of compute units is to process at least one of the plurality of tasks.

6.

发明公开
BROADCAST HUB FOR MULTI-PROCESSOR ARRANGEMENT 审中-公开

公开(公告)号：US20230315677A1

公开(公告)日：2023-10-05

申请号：US17709255

申请日：2022-03-30

Applicant: Arm Limited

Inventor： Erik Persson , Graeme Leslie Ingram , Rune Holm , John Wakefield Brothers, III

IPC: G06F15/80

CPC classification number: G06F15/80

Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast hubs for multi-processor arrangements. A processing tile may comprise a broadcast hub to obtain a plurality of parameters applicable in a particular operation from at least one of a plurality of processing tiles and initiate distribution of the plurality of parameters to the plurality of processing tiles, wherein the plurality of processing tiles may execute the particular operation based at least in part on the plurality of distributed parameters.

7.

发明授权
Register-based matrix multiplication with multiple matrices per register 有权

公开(公告)号：US11288066B2

公开(公告)日：2022-03-29

申请号：US16626701

申请日：2018-06-08

Applicant: ARM LIMITED

Inventor： David Hennah Mansell , Rune Holm , Ian Michael Caulfield , Jelena Milanovic

IPC: G06F9/302 , G06F9/30 , G06F17/16

Abstract: Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.

8.

发明申请
NEURAL NETWORK PROCESSING 有权

公开(公告)号：US20220092409A1

公开(公告)日：2022-03-24

申请号：US17030176

申请日：2020-09-23

Applicant: Arm Limited

Inventor： John Wakefield Brothers, III , Rune Holm , Elliott Maurice Simon Rosemarine

IPC: G06N3/08

Abstract: To perform neural network processing to modify an input data array to generate a corresponding output data array using a filter comprising an array of weight data, at least one of the input data array and the filter are subdivided into a plurality of portions, a plurality of neural network processing passes using the portions are performed, and the output generated by each processing pass is combined to provide the output data array.

9.

发明授权
Broadcast regions for multi-processor arrangement 有权

公开(公告)号：US12001369B2

公开(公告)日：2024-06-04

申请号：US17709293

申请日：2022-03-30

Applicant: Arm Limited

Inventor： Erik Persson , Graeme Leslie Ingram , Rune Holm , John Wakefield Brothers, III

IPC: G06F13/42 , G06N5/04

CPC classification number: G06F13/42 , G06N5/04

Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast regions for multi-processor arrangements.

10.

发明公开
BROADCAST REGIONS FOR MULTI-PROCESSOR ARRANGEMENT 审中-公开

公开(公告)号：US20230315670A1

公开(公告)日：2023-10-05

申请号：US17709293

申请日：2022-03-30

Applicant: Arm Limited

Inventor： Erik Persson , Graeme Leslie Ingram , Rune Holm , John Wakefield Brothers, III

IPC: G06F13/42 , G06N5/04

CPC classification number: G06F13/42 , G06N5/04

Abstract: The present disclosure relates generally to multi-processor arrangements and, more particularly, to broadcast regions for multi-processor arrangements.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification