专利检索 ap:("Martin Power" OR "Conor Byrne" OR "Niall Hanrahan" OR "Deepak Abraham Mathaikutty" OR "Arnab Raha" OR "Raymond Jit-Hung Sung" OR "David Thomas Bernard" OR "Kevin Brady" OR "Martin-Thomas Grymel") AND inv:"David Thomas Bernard" 第 1 页

1.

发明申请
SPARSITY PROCESSING ON UNPACKED DATA 有权

公开(公告)号：US20230018857A1

公开(公告)日：2023-01-19

申请号：US17947642

申请日：2022-09-19

申请人： Martin Power , Conor Byrne , Niall Hanrahan , Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , David Thomas Bernard , Kevin Brady , Martin-Thomas Grymel

发明人： Martin Power , Conor Byrne , Niall Hanrahan , Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , David Thomas Bernard , Kevin Brady , Martin-Thomas Grymel

IPC分类号： G06N3/04 , G06N3/08

摘要： Sparsity processing within a compute block can be done on unpacked data. The compute block includes a sparsity decoder that generates a combined sparsity vector from an activation sparsity vector and a weight sparsity vector. The activation sparsity vector indicates positions of non-zero valued activations in an activation context. The weight sparsity vector indicates positions of non-zero valued weights in a weight context. The combined sparsity vector comprises one or more zero valued bits and one or more non-zero valued bits. The sparsity decoder may determine the position of a non-zero valued bit in the combined sparsity vector and determine an address for the non-zero valued activation and the non-zero valued weight based on the position of the non-zero valued bit. The non-zero valued activation and the non-zero valued weight may be provided to a PE for performing MAC operations.

2.

发明申请
POWER EFFICIENT REGISTER FILES FOR DEEP NEURAL NETWORK (DNN) ACCELERATOR 有权

公开(公告)号：US20230014656A1

公开(公告)日：2023-01-19

申请号：US17934665

申请日：2022-09-23

申请人： Raymond Jit-Hung Sung , Deepak Abraham Mathaikutty , Amit Agarwal , David Thomas Bernard , Steven Hsu , Martin Power , Conor Byme , Arnab Raha

发明人： Raymond Jit-Hung Sung , Deepak Abraham Mathaikutty , Amit Agarwal , David Thomas Bernard , Steven Hsu , Martin Power , Conor Byme , Arnab Raha

IPC分类号： G01R31/3177 , G06N3/04

摘要： A memory array of a compute tile may store activations or weights of a DNN. The memory array may include databanks for storing contexts, context MUXs, and byte MUXs. A databank may store a context with flip-flop arrays, each of which includes a sequence of flip-flops. A logic gate and an ICG unit may gate flip-flops and control whether states of the flip-flops can be changed. The data gating can prevent a context not selected for the databank from inadvertently toggling and wasting power A context MUX may read a context from different flip-flop arrays in a databank based on gray-coded addresses. A byte MUX can combine bits from different bytes in a context read by the context MUX. The memory array may be implemented with bit packing to reduce distance between the context MUX and byte MUX to reduce lengths of wires connecting the context MUXs and byte MUXs.

3.

发明申请
WRITE COMBINE BUFFER (WCB) FOR DEEP NEURAL NETWORK (DNN) ACCELERATOR 有权

公开(公告)号：US20230020929A1

公开(公告)日：2023-01-19

申请号：US17946311

申请日：2022-09-16

申请人： Martin-Thomas Grymel , David Thomas Bernard , Martin Power , Niall Hanrahan , Kevin Brady

发明人： Martin-Thomas Grymel , David Thomas Bernard , Martin Power , Niall Hanrahan , Kevin Brady

IPC分类号： G06F3/06 , G06N3/04

摘要： A compute tile includes a WCB that receives a workload of writing an output tensor of a convolution into a local memory of the compute tile. The local memory may be a SRAM. The WCB receives write transactions. A write transaction includes a data block, which is a part of the output tensor, and metadata describing one or more attributes of the data block. The WCB may store write transactions in its internal buffers. The WCB may determine whether to combine two write transactions, e.g., based on an operation mode or metadata in the write transactions. In embodiments where the WCB determines to combine the two write transactions, the WCB may combine the two write transactions into a new write transaction and write the new write transaction into the local memory or an internal memory of the WCB. The total number of write transactions for the workload can be reduced.

4.

发明申请
DEEP NEURAL NETWORK (DNN) ACCELERATORS WITH WEIGHT LAYOUT REARRANGEMENT 有权

公开(公告)号：US20230017662A1

公开(公告)日：2023-01-19

申请号：US17946231

申请日：2022-09-16

申请人： Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard

发明人： Sudheendra Kadri , Darren Crews , Deepak Abraham Mathaikutty , Andrea Deidda , Arnab Raha , Kevin Brady , David Thomas Bernard

IPC分类号： G06N3/063 , G06F13/28

摘要： An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).

5.

发明申请
Kernel Decomposition and Activation Broadcasting in Deep Neural Networks (DNNs) 有权

公开(公告)号：US20230008622A1

公开(公告)日：2023-01-12

申请号：US17934265

申请日：2022-09-22

申请人： Richard Boyd , David Thomas Bernard , Deepak Abraham Mathaikutty , Martin Power , Niall Hanrahan

发明人： Richard Boyd , David Thomas Bernard , Deepak Abraham Mathaikutty , Martin Power , Niall Hanrahan

IPC分类号： G06F9/50 , G06F7/544 , G06F7/50

摘要： An DNN accelerator may perform 1×N kernel decomposition to decompose a convolutional kernel into kernel vectors, each of which includes multiple weights. Through the kernel decomposition, a weight operand may be generated from a filter. The DNN accelerator converts an input tensor into input operands. An input operand includes activations and has the same size as the weight operand. The DNN accelerator may read a first activation in the input operand from memory to an internal memory of a first PE and read a second activation in the input operand from the memory to an internal memory of a second PE. The first PE may receive the second activation from the second PE through activation broadcasting between the two PEs and perform MAC operations on the input operand and weight operand. The second PE may perform MAC operations on another input operand in the input tensor and the weight operand.

6.

发明申请
DEEP NEURAL NETWORK (DNN) ACCELERATOR FACILITATING ACTIVATION COMPRESSION 有权

公开(公告)号：US20230072082A1

公开(公告)日：2023-03-09

申请号：US18050944

申请日：2022-10-28

申请人： Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard

发明人： Sudheendra Kadri , Andrea Deidda , Hassan Kamal , Martin-Thomas Grymel , Alfonso Tarazona Martinez , David Thomas Bernard

IPC分类号： G06N3/08

摘要： A system includes a first memory, a compiler, and a DNN accelerator. The DNN accelerator includes a DMA engine, an acceleration module, and a compute block. The compute block includes a second memory. The compiler may generate a task for transferring activations from the second memory to the first memory. The DMA engine may receive the task and read the activations from the second memory. The acceleration module may compress the activations to generate compressed activation data and write the compressed activation data into the external memory. The acceleration module may also store a size of the compressed activation data in the local memory, which may be used by the DMA engine to read the activation from the first memory to the second memory later. The compressed activation data may include non-zero activations and sparsity bitmaps. The compressed activation data may also include a header or zeropoint marker.

7.

发明申请
DECOMPOSING A DECONVOLUTION INTO MULTIPLE CONVOLUTIONS 有权

公开(公告)号：US20230016455A1

公开(公告)日：2023-01-19

申请号：US17935163

申请日：2022-09-26

申请人： Alessandro Palla , David Thomas Bernard , Niall Hanrahan

发明人： Alessandro Palla , David Thomas Bernard , Niall Hanrahan

IPC分类号： G06N3/08 , G06F17/15

摘要： A deconvolution can be decomposed into multiple convolutions. Results of the convolutions constitute an output of the deconvolution. Zeros may be added to an input tensor of the deconvolution to generate an upsampled input tensor. Subtensors having the same size as the kernel of the deconvolution may be identified from the upsampled input tensor. A subtensor may include one or more input activations and one or more zeros. Subtensors having same distribution patterns of input activations may be used to generate a reduced kernel. The reduced kernel includes a subset of the kernel. The position of a weight in the reduced kernel may be the same as the positions of an input activation in the subtensor. Multiple reduced kernels may be generated based on multiple subtensors having different distribution patterns of activations. Each of the convolutions may use the input tensor and a different one of the reduced kernels.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类