-
公开(公告)号:US20230018857A1
公开(公告)日:2023-01-19
申请号:US17947642
申请日:2022-09-19
申请人: Martin Power , Conor Byrne , Niall Hanrahan , Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , David Thomas Bernard , Kevin Brady , Martin-Thomas Grymel
发明人: Martin Power , Conor Byrne , Niall Hanrahan , Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , David Thomas Bernard , Kevin Brady , Martin-Thomas Grymel
摘要: Sparsity processing within a compute block can be done on unpacked data. The compute block includes a sparsity decoder that generates a combined sparsity vector from an activation sparsity vector and a weight sparsity vector. The activation sparsity vector indicates positions of non-zero valued activations in an activation context. The weight sparsity vector indicates positions of non-zero valued weights in a weight context. The combined sparsity vector comprises one or more zero valued bits and one or more non-zero valued bits. The sparsity decoder may determine the position of a non-zero valued bit in the combined sparsity vector and determine an address for the non-zero valued activation and the non-zero valued weight based on the position of the non-zero valued bit. The non-zero valued activation and the non-zero valued weight may be provided to a PE for performing MAC operations.
-
公开(公告)号:US20230014656A1
公开(公告)日:2023-01-19
申请号:US17934665
申请日:2022-09-23
申请人: Raymond Jit-Hung Sung , Deepak Abraham Mathaikutty , Amit Agarwal , David Thomas Bernard , Steven Hsu , Martin Power , Conor Byme , Arnab Raha
发明人: Raymond Jit-Hung Sung , Deepak Abraham Mathaikutty , Amit Agarwal , David Thomas Bernard , Steven Hsu , Martin Power , Conor Byme , Arnab Raha
IPC分类号: G01R31/3177 , G06N3/04
摘要: A memory array of a compute tile may store activations or weights of a DNN. The memory array may include databanks for storing contexts, context MUXs, and byte MUXs. A databank may store a context with flip-flop arrays, each of which includes a sequence of flip-flops. A logic gate and an ICG unit may gate flip-flops and control whether states of the flip-flops can be changed. The data gating can prevent a context not selected for the databank from inadvertently toggling and wasting power A context MUX may read a context from different flip-flop arrays in a databank based on gray-coded addresses. A byte MUX can combine bits from different bytes in a context read by the context MUX. The memory array may be implemented with bit packing to reduce distance between the context MUX and byte MUX to reduce lengths of wires connecting the context MUXs and byte MUXs.
-
公开(公告)号:US20230020929A1
公开(公告)日:2023-01-19
申请号:US17946311
申请日:2022-09-16
摘要: A compute tile includes a WCB that receives a workload of writing an output tensor of a convolution into a local memory of the compute tile. The local memory may be a SRAM. The WCB receives write transactions. A write transaction includes a data block, which is a part of the output tensor, and metadata describing one or more attributes of the data block. The WCB may store write transactions in its internal buffers. The WCB may determine whether to combine two write transactions, e.g., based on an operation mode or metadata in the write transactions. In embodiments where the WCB determines to combine the two write transactions, the WCB may combine the two write transactions into a new write transaction and write the new write transaction into the local memory or an internal memory of the WCB. The total number of write transactions for the workload can be reduced.
-
公开(公告)号:US20230017662A1
公开(公告)日:2023-01-19
申请号:US17946231
申请日:2022-09-16
申请人: Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard
发明人: Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard
摘要: An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).
-
公开(公告)号:US20230008622A1
公开(公告)日:2023-01-12
申请号:US17934265
申请日:2022-09-22
申请人: Richard Boyd , David Thomas Bernard , Deepak Abraham Mathaikutty , Martin Power , Niall Hanrahan
发明人: Richard Boyd , David Thomas Bernard , Deepak Abraham Mathaikutty , Martin Power , Niall Hanrahan
摘要: An DNN accelerator may perform 1×N kernel decomposition to decompose a convolutional kernel into kernel vectors, each of which includes multiple weights. Through the kernel decomposition, a weight operand may be generated from a filter. The DNN accelerator converts an input tensor into input operands. An input operand includes activations and has the same size as the weight operand. The DNN accelerator may read a first activation in the input operand from memory to an internal memory of a first PE and read a second activation in the input operand from the memory to an internal memory of a second PE. The first PE may receive the second activation from the second PE through activation broadcasting between the two PEs and perform MAC operations on the input operand and weight operand. The second PE may perform MAC operations on another input operand in the input tensor and the weight operand.
-
公开(公告)号:US20230072082A1
公开(公告)日:2023-03-09
申请号:US18050944
申请日:2022-10-28
申请人: Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard
发明人: Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard
IPC分类号: G06N3/08
摘要: A system includes a first memory, a compiler, and a DNN accelerator. The DNN accelerator includes a DMA engine, an acceleration module, and a compute block. The compute block includes a second memory. The compiler may generate a task for transferring activations from the second memory to the first memory. The DMA engine may receive the task and read the activations from the second memory. The acceleration module may compress the activations to generate compressed activation data and write the compressed activation data into the external memory. The acceleration module may also store a size of the compressed activation data in the local memory, which may be used by the DMA engine to read the activation from the first memory to the second memory later. The compressed activation data may include non-zero activations and sparsity bitmaps. The compressed activation data may also include a header or zeropoint marker.
-
公开(公告)号:US20230016455A1
公开(公告)日:2023-01-19
申请号:US17935163
申请日:2022-09-26
摘要: A deconvolution can be decomposed into multiple convolutions. Results of the convolutions constitute an output of the deconvolution. Zeros may be added to an input tensor of the deconvolution to generate an upsampled input tensor. Subtensors having the same size as the kernel of the deconvolution may be identified from the upsampled input tensor. A subtensor may include one or more input activations and one or more zeros. Subtensors having same distribution patterns of input activations may be used to generate a reduced kernel. The reduced kernel includes a subset of the kernel. The position of a weight in the reduced kernel may be the same as the positions of an input activation in the subtensor. Multiple reduced kernels may be generated based on multiple subtensors having different distribution patterns of activations. Each of the convolutions may use the input tensor and a different one of the reduced kernels.
-
-
-
-
-
-