Patent search ap:("ARM Limited") AND inv:"Jesse Garrett BEU" Page 1

1.

发明公开
NEURAL PROCESSING UNIT FOR ATTENTION-BASED INFERENCE 审中-公开

公开(公告)号：US20240028877A1

公开(公告)日：2024-01-25

申请号：US17870038

申请日：2022-07-21

Applicant: Arm Limited

Inventor： Shounak DATTA , Dibakar GOPE , Jesse Garrett BEU , Mark John O'CONNOR

IPC: G06N3/063

CPC classification number: G06N3/063

Abstract: There is provided a neural processing unit for calculating an attention matrix during machine learning inference. The neural processing unit is configured to calculate: a first score matrix based on differences between a query matrix and a key matrix; a second score matrix based on differences between the key matrix and a learned key matrix; a similarity matrix based on a combination of the first score matrix and second score matrix; and an attention matrix comprising applying a normalisation function to the similarity matrix. Also provided is an apparatus comprising at least one said neural processing unit and at least one memory, the memory configured to pass, on demand, a learned key matrix to the neural processing unit. Also provided is a computer program product having computer readable program code stored thereon which, when executed by said neural processing unit, causes the unit to perform said calculations.

2.

发明申请
APPARATUS AND METHOD FOR MAINTAINING CACHE COHERENCE DATA FOR MEMORY BLOCKS OF DIFFERENT SIZE GRANULARITIES USING A SNOOP FILTER STORAGE COMPRISING AN N-WAY SET ASSOCIATIVE STORAGE STRUCTURE 有权

公开(公告)号：US20210294743A1

公开(公告)日：2021-09-23

申请号：US16821271

申请日：2020-03-17

Applicant: Arm Limited

Inventor： Joshua RANDALL , Jesse Garrett BEU

IPC: G06F12/0815 , G06F12/0895 , G06F12/0884 , G06F12/02 , G06F12/14

Abstract: An apparatus is provided for receiving requests from a plurality of processing units, at least some of which may have associated cache storage. A snoop unit implements a cache coherency protocol when a request received by the apparatus identifies a cacheable memory address. Snoop filter storage is provided comprising an N-way set associative storage structure with a plurality of entries. Each entry stores coherence data for an associated address range identifying a memory block, and the coherence data is used to determine which cache storages need to be subjected to a snoop operation when implementing the cache coherency protocol in response to a received request. The snoop filter storage stores coherence data for memory blocks of at least a plurality P of different size granularities, and is organised as a plurality of at least P banks that are accessible in parallel, where each bank has entries within each of the N-ways of the snoop filter storage. The snoop control circuitry controls access to the snoop filter storage, and is responsive to a received address to create a group of indexes, the group of indexes comprising an index for each different size granularity amongst the P different size granularities, and each index in the group being constrained so as to identify an entry in a different bank of the snoop filter storage. The snoop control circuitry uses the group of indexes to perform a lookup operation in parallel within the snoop filter storage in order to determine, taking into account each of the different size granularities, whether an entry stores coherence data for the received address.

3.

发明公开
MATRIX MULTIPLICATION IN A DYNAMICALLY SPATIALLY AND DYNAMICALLY TEMPORALLY DIVIDABLE ARCHITECTURE 审中-公开

公开(公告)号：US20240320292A1

公开(公告)日：2024-09-26

申请号：US18125432

申请日：2023-03-23

Applicant: Arm Limited

Inventor： Jesse Garrett BEU , Thomas Christopher GROCUTT

IPC: G06F17/16

CPC classification number: G06F17/16

Abstract: A data processing apparatus includes input circuitry that receives a matrix having values in a first format. Output circuitry outputs the matrix having the values in a second format while adjustment circuitry performs a modification of the matrix from the first format to the second format. The second format is computationally contiguous in respect of a data processing apparatus having the first and second vector registers both configured to be dynamically spatially and dynamically temporally divided, performing a matrix multiplication.

4.

发明申请
SYSTEM, METHOD AND APPARATUS FOR NEURAL NETWORKS 审中-公开

公开(公告)号：US20200042877A1

公开(公告)日：2020-02-06

申请号：US16054358

申请日：2018-08-03

Applicant: Arm Limited

Inventor： Paul Nicholas WHATMOUGH , Matthew MATTINA , Jesse Garrett BEU

IPC: G06N3/08 , G06F15/18 , G06N3/04

Abstract: An system, apparatus and method for utilizing software and hardware portions of a neural network to fix, or hardwire, certain portions while modifying other portions. A first set of weights for layers of the first neural network are established, and selected weights are modified to generate a second set of weights, based on a second dataset. The second set of weights is then used to train a second neural network.

5.

发明申请
AN APPARATUS AND METHOD FOR PROVIDING COHERENCE DATA FOR USE WHEN IMPLEMENTING A CACHE COHERENCY PROTOCOL 有权

公开(公告)号：US20230139212A1

公开(公告)日：2023-05-04

申请号：US17905566

申请日：2021-01-18

Applicant: Arm Limited

Inventor： Joshua RANDALL , Jesse Garrett BEU

IPC: G06F12/02 , G06F12/0871 , G06F12/0831

Abstract: An apparatus and method are provided for receiving a request from a plurality of processing units, where multiple of those processing units have associated cache storage. A snoop unit is used to implement a cache coherency protocol when a request is received that identifies a cacheable memory address. The snoop unit has snoop filter storage comprising a plurality of snoop filter tables organized in a hierarchical arrangement. The snoop filter tables comprise a primary snoop filter table at a highest level in the hierarchy, and each snoop filter table at a lower level in the hierarchy forms a backup snoop filter table for an adjacent snoop filter table at a higher level in the hierarchy. Each snoop filter table is arranged as a multi-way set associative storage structure, and each backup snoop filter table has a different number of sets than are provided in the adjacent snoop filter table.

6.

发明申请
MIXED-ELEMENT-SIZE INSTRUCTION 有权

公开(公告)号：US20210389948A1

公开(公告)日：2021-12-16

申请号：US16897483

申请日：2020-06-10

Applicant: Arm Limited

Inventor： Jesse Garrett BEU , Dibakar GOPE , David Hennah MANSELL

IPC: G06F9/30

Abstract: A mixed-element-size instruction is described, which specifies a first operand and a second operand stored in registers. In response to the mixed-element-size instruction, an instruction decoder controls processing circuitry to perform an arithmetic/logical operation on two or more first data elements of the first operand and two or more second data elements of the second operand, where the first data elements have a larger data element size than the second data elements. This is particularly useful for machine learning applications to improve processing throughput and memory bandwidth utilisation.

7.

发明公开
MATRIX MULTIPLICATION IN A DYNAMICALLY SPATIALLY AND DYNAMICALLY TEMPORALLY DIVIDABLE ARCHITECTURE 审中-公开

公开(公告)号：US20240320005A1

公开(公告)日：2024-09-26

申请号：US18125416

申请日：2023-03-23

Applicant: Arm Limited

Inventor： Jesse Garrett BEU , Thomas Christopher GROCUTT

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30098

Abstract: A data processing apparatus includes first vector registers and second vector registers, both dynamically spatially and dynamically temporally dividable. Decode circuitry receives one or more matrix multiplication instructions that indicate a set of first elements in the first vector registers and a set of second elements in the second vector registers, and in response to receiving the matrix multiplication instructions they generate a matrix multiplication operation. The matrix multiplication operation causes one or more execution units to perform a matrix multiplication of the set of first elements by the set of second elements and an average bit width of the first elements is different to an average bit width of the second elements.

8.

发明申请
NON-VOLATILE STORAGE CIRCUITRY ACCESSIBLE AS PRIMARY STORAGE FOR PROCESSING CIRCUITRY 有权

公开(公告)号：US20210011638A1

公开(公告)日：2021-01-14

申请号：US16507348

申请日：2019-07-10

Applicant: Arm Limited

Inventor： Christopher Neal HINDS , Jesse Garrett BEU , Alejandro RICO CARRO , Jose Alberto JOAO

IPC: G06F3/06 , G06F9/30

Abstract: Non-volatile storage circuitry is provided as primary storage accessible to processing circuitry, e.g. as registers, a cache, scratchpad memory, TLB or on-chip RAM. Power control circuitry powers down a given region of the non-volatile storage circuitry when information stored in said given region is not being used. This provides opportunities for more frequent power savings than would be possible if primary storage was implemented using volatile storage.

9.

发明申请
COUNTING ELEMENTS IN DATA ITEMS IN A DATA PROCESSING APPARATUS 审中-公开

公开(公告)号：US20190042253A1

公开(公告)日：2019-02-07

申请号：US15665781

申请日：2017-08-01

Applicant: ARM Limited

Inventor： Mbou EYOLE , Jesse Garrett BEU , Alejandro Martinez VICENTE , Timothy HAYES

IPC: G06F9/30

Abstract: An apparatus and method of operating the apparatus are provided for performing a count operation. Instruction decoder circuitry is responsive to a count instruction specifying an input data item to generate control signals to control the data processing circuitry to perform a count operation. The count operation determines a count value indicative of a number of input elements of a subset of elements in the specified input data item which have a value which matches a reference value in a reference element in a reference data item. A plurality of count operations may be performed to determine a count data item corresponding to the input data item. A register scatter storage instruction, a gather index generation instruction, and respective apparatuses responsive to them, as well as simulator implementations, are also provided.

10.

发明申请
MATCHING CONSECUTIVE VALUES IN A DATA PROCESSING APPARATUS 审中-公开

公开(公告)号：US20190042190A1

公开(公告)日：2019-02-07

申请号：US15665715

申请日：2017-08-01

Applicant: ARM Limited

Inventor： Alejandro Martinez VICENTE , Jesse Garrett BEU , Mbou EYOLE , Timothy HAYES

IPC: G06F7/02 , G06F9/30

Abstract: An apparatus and a method of operating the apparatus are provided for performing a comparison operation to match a given sequence of values within an input vector. Instruction decoder circuitry is responsive to a string match instruction specifying a segment of an input vector to generate control signals to control the data processing circuitry to perform a comparison operation. The comparison operation determines a comparison value indicative of whether each input element of a required set of consecutive input elements of the segment has a value which matches a respective value in consecutive reference elements of the reference data item. A plurality of comparison operations may be performed to determine a match vector corresponding to the segment of the input vector to indicate the start position of the substring in the input vector. A string match instruction, as well as simulator virtual machine implementations, are also provided.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification