专利检索 ap:("Alejandro Castro Gonzalez" OR "Praveen Nair" OR "Somnath Paul" OR "Sudheendra Kadri" OR "Palanivel Guruvareddiar" OR "Aaron Gubrud" OR "Vinodh Gopal") AND inv:"Sudheendra Kadri" 第 1 页

1.

发明申请
METHODS AND APPARATUS FOR HIGH THROUGHPUT COMPRESSION OF NEURAL NETWORK WEIGHTS 有权

公开(公告)号：US20220012563A1

公开(公告)日：2022-01-13

申请号：US17484226

申请日：2021-09-24

申请人： Alejandro Castro Gonzalez , Praveen Nair , Somnath Paul , Sudheendra Kadri , Palanivel Guruvareddiar , Aaron Gubrud , Vinodh Gopal

发明人： Alejandro Castro Gonzalez , Praveen Nair , Somnath Paul , Sudheendra Kadri , Palanivel Guruvareddiar , Aaron Gubrud , Vinodh Gopal

IPC分类号： G06N3/04

摘要： Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.

2.

发明申请
DEEP NEURAL NETWORK (DNN) ACCELERATOR FACILITATING ACTIVATION COMPRESSION 有权

公开(公告)号：US20230072082A1

公开(公告)日：2023-03-09

申请号：US18050944

申请日：2022-10-28

申请人： Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard

发明人： Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard

IPC分类号： G06N3/08

摘要： A system includes a first memory, a compiler, and a DNN accelerator. The DNN accelerator includes a DMA engine, an acceleration module, and a compute block. The compute block includes a second memory. The compiler may generate a task for transferring activations from the second memory to the first memory. The DMA engine may receive the task and read the activations from the second memory. The acceleration module may compress the activations to generate compressed activation data and write the compressed activation data into the external memory. The acceleration module may also store a size of the compressed activation data in the local memory, which may be used by the DMA engine to read the activation from the first memory to the second memory later. The compressed activation data may include non-zero activations and sparsity bitmaps. The compressed activation data may also include a header or zeropoint marker.

3.

发明申请
DEEP NEURAL NETWORK (DNN) ACCELERATORS WITH WEIGHT LAYOUT REARRANGEMENT 有权

公开(公告)号：US20230017662A1

公开(公告)日：2023-01-19

申请号：US17946231

申请日：2022-09-16

申请人： Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard

发明人： Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard

IPC分类号： G06N3/063 , G06F13/28

摘要： An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).