-
公开(公告)号:US20220012563A1
公开(公告)日:2022-01-13
申请号:US17484226
申请日:2021-09-24
申请人: Alejandro Castro Gonzalez , Praveen Nair , Somnath Paul , Sudheendra Kadri , Palanivel Guruvareddiar , Aaron Gubrud , Vinodh Gopal
发明人: Alejandro Castro Gonzalez , Praveen Nair , Somnath Paul , Sudheendra Kadri , Palanivel Guruvareddiar , Aaron Gubrud , Vinodh Gopal
IPC分类号: G06N3/04
摘要: Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.
-
公开(公告)号:US20230072082A1
公开(公告)日:2023-03-09
申请号:US18050944
申请日:2022-10-28
申请人: Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard
发明人: Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard
IPC分类号: G06N3/08
摘要: A system includes a first memory, a compiler, and a DNN accelerator. The DNN accelerator includes a DMA engine, an acceleration module, and a compute block. The compute block includes a second memory. The compiler may generate a task for transferring activations from the second memory to the first memory. The DMA engine may receive the task and read the activations from the second memory. The acceleration module may compress the activations to generate compressed activation data and write the compressed activation data into the external memory. The acceleration module may also store a size of the compressed activation data in the local memory, which may be used by the DMA engine to read the activation from the first memory to the second memory later. The compressed activation data may include non-zero activations and sparsity bitmaps. The compressed activation data may also include a header or zeropoint marker.
-
公开(公告)号:US20230017662A1
公开(公告)日:2023-01-19
申请号:US17946231
申请日:2022-09-16
申请人: Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard
发明人: Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard
摘要: An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).
-
-