Patent search ap:("Arm Limited") AND inv:"Zhi-Gang Liu" Page 2

11.

发明授权
Hardware accelerator for IM2COL operation 有权

公开(公告)号：US11783163B2

公开(公告)日：2023-10-10

申请号：US16901542

申请日：2020-06-15

Applicant: Arm Limited

Inventor： Zhi-Gang Liu , Paul Nicholas Whatmough , Matthew Mattina

IPC: G06N3/04 , G06N3/08 , G06F17/16 , G06F9/30

CPC classification number: G06N3/04 , G06F9/30105 , G06F17/16 , G06N3/08

Abstract: The present disclosure advantageously provides a matrix expansion unit that includes an input data selector, a first register set, a second register set, and an output data selector. The input data selector is configured to receive first matrix data in a columnwise format. The first register set is coupled to the input data selector, and includes a plurality of data selectors and a plurality of registers arranged in a first shift loop. The second register set is coupled to the data selector, and includes a plurality of data selectors and a plurality of registers arranged in a second shift loop. The output data selector is coupled to the first register set and the second register set, and is configured to output second matrix data in a rowwise format.

12.

发明申请
SYSTEM, CIRCUIT, DEVICE AND/OR PROCESSES FOR ACCUMULATING NEURAL NETWORK SIGNALS 有权

公开(公告)号：US20230026113A1

公开(公告)日：2023-01-26

申请号：US17382108

申请日：2021-07-21

Applicant: Arm Limited

Inventor： Paul Nicholas Whatmough , Zhi-Gang Liu , Matthew Mattina

IPC: G06N3/04 , G06N3/063

Abstract: Example methods, devices and/or circuits to be implemented in a processing device to perform neural network-based computing operations. According to an embodiment, an accumulation of weighted activation input values may be computed on accumulation cycles at least in part by multiplying and/or scaling accumulated activation input values by an associated neural network weight.

13.

发明申请
Memory for an Artificial Neural Network Accelerator 有权

公开(公告)号：US20220164137A1

公开(公告)日：2022-05-26

申请号：US17103629

申请日：2020-11-24

Applicant: Arm Limited

Inventor： Mudit Bhargava , Paul Nicholas Whatmough , Supreet Jeloka , Zhi-Gang Liu

IPC: G06F3/06 , G06N3/063

Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of read word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each read word selector has a plurality of input ports and an output port, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the read word selectors of the first bank and the second bank, and configured to select a combination of read word selectors from at least one of the first bank and the second bank based on a bank select signal.

14.

发明申请
Time Domain Unrolling Sparse Matrix Multiplication System and Method 有权

公开(公告)号：US20220035890A1

公开(公告)日：2022-02-03

申请号：US17103676

申请日：2020-11-24

Applicant: Arm Limited

Inventor： Zhi-Gang Liu , Paul Nicholas Whatmough , Matthew Mattina

IPC: G06F17/16 , G06F7/544 , G06F7/523 , G06F7/50 , G06F9/50 , G06F15/80

Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, calculate a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.

15.

发明申请
Modulo Operation Unit 有权

公开(公告)号：US20210374509A1

公开(公告)日：2021-12-02

申请号：US16889031

申请日：2020-06-01

Applicant: Arm Limited

Inventor： Zhi-Gang Liu , Matthew Mattina

IPC: G06N3/063 , G06N3/08 , G06F9/50 , G06F9/38 , G06F7/72 , G06F7/50 , G06F7/02

Abstract: The present disclosure advantageously provides a modulo operation unit that includes a first input configured to receive operand data, a second input configured to receive modulus data, an initial modulo stage, a sequence of intermediate modulo stages, and a final modulo stage.

16.

发明授权
Time domain unrolling sparse matrix multiplication system and method 有权

公开(公告)号：US11928176B2

公开(公告)日：2024-03-12

申请号：US17103676

申请日：2020-11-24

Applicant: Arm Limited

Inventor： Zhi-Gang Liu , Paul Nicholas Whatmough , Matthew Mattina

IPC: G06F17/16 , G06F7/544 , G06F9/38 , G06F15/80

CPC classification number: G06F17/16 , G06F7/5443 , G06F15/80 , G06F9/3893

Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.

17.

发明申请
Multi-Dimensional Data Path Architecture 有权

公开(公告)号：US20220382690A1

公开(公告)日：2022-12-01

申请号：US17334960

申请日：2021-05-31

Applicant: Arm Limited

Inventor： Paul Nicholas Whatmough , Zhi-Gang Liu , Supreet Jeloka , Saurabh Pijuskumar Sinha , Matthew Mattina

IPC: G06F13/16 , G06F13/40

Abstract: Various implementations described herein are directed to a device having a multi-layered logic structure with a first logic layer and a second logic layer arranged vertically in a stacked configuration. The device may have a memory array that provides data, and also, the device may have an inter-layer data bus that vertically couples the memory array to the multi-layered logic structure. The inter-layer data bus may provide multiple data paths to the first logic layer and the second logic layer for reuse of the data provided by the memory array.

18.

发明授权
Modulo operation unit 有权

公开(公告)号：US11507813B2

公开(公告)日：2022-11-22

申请号：US16889031

申请日：2020-06-01

Applicant: Arm Limited

Inventor： Zhi-Gang Liu , Matthew Mattina

IPC: G06N3/063 , G06F7/02 , G06F7/50 , G06F7/72 , G06F9/38 , G06F9/50 , G06N3/08

Abstract: The present disclosure advantageously provides a modulo operation unit that includes a first input configured to receive operand data, a second input configured to receive modulus data, an initial modulo stage, a sequence of intermediate modulo stages, and a final modulo stage.

19.

发明申请
Activation Compression Method for Deep Learning Acceleration 有权

公开(公告)号：US20220164663A1

公开(公告)日：2022-05-26

申请号：US17157319

申请日：2021-01-25

Applicant: Arm Limited

Inventor： Zhi-Gang Liu , Matthew Mattina

IPC: G06N3/08 , G06F17/16 , G06F7/523 , G06F9/50

Abstract: A system and method for multiplying matrices, and method for training a convolutional neural network (CNN), are provided. The system includes a processor and a matrix multiply accelerator (MMA). The processor is configured to generate, based on an input tensor, a number of basic block matrices, each basic block matrix including a number of elements; for each basic block matrix: prune, based on a sparsity value, the elements of the basic block matrix, generate a mask for the basic block matrix, each mask including a number of bits, each bit corresponding to a different element of the basic block matrix, and compress the basic block matrix to generate a compressed basic block matrix having fewer elements than the basic block matrix. The MMA is configured to multiply, based on the masks, the compressed basic block matrices and a weight matrix to generate an output matrix.

20.

发明申请
Memory for an Artificial Neural Network Accelerator 有权

公开(公告)号：US20220164127A1

公开(公告)日：2022-05-26

申请号：US17103632

申请日：2020-11-24

Applicant: Arm Limited

Inventor： Mudit Bhargava , Paul Nicholas Whatmough , Supreet Jeloka , Zhi-Gang Liu

IPC: G06F3/06 , G06N3/063

Abstract: A memory for an artificial neural network (ANN) accelerator is provided. The memory includes a first bank, a second bank and a bank selector. Each bank includes at least two word lines and a plurality of write word selectors. Each word line stores a plurality of words, and each word has a plurality of bytes. Each write word selector has an input port and a plurality of output ports, is coupled to a corresponding word in each word line, and is configured to select a byte of the corresponding word of a selected word line based on a byte select signal. The bank selector is coupled to the write word selectors of the first bank and the second bank, and configured to select a combination of write word selectors from at least one of the first bank and the second bank based on a bank select signal.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification