-
公开(公告)号:EP4328802A1
公开(公告)日:2024-02-28
申请号:EP23186330.9
申请日:2023-07-19
申请人: INTEL Corporation
发明人: RAHA, Arnab , CHEEMA, Umer Iftikhar , GUPTA, Praveen Kumar , MATHAIKUTTY, Deepak Abraham , SUNG, Raymond Jit-Hung
IPC分类号: G06N3/063 , G06N3/0464 , G06N3/048 , G06N3/09
摘要: An DNN accelerator includes one or more heterogenous tile sets. A heterogenous tile set includes tiles of different sizes, e.g., PE arrays including different numbers of columns or rows. The DNN accelerator may identify a tile set from the tile sets for running a DNN model based on dimensions of output tensors convolutional layers in the DNN. Within the selected tile set, a tile may be selected for a convolutional layer in the DNN, e.g., based on dimensions of the output tensor of the convolutional layer and the size of the tile. After the tile is selected, the workload for running a convolutional operation of the layer may be partitioned and assigned to individual PEs in the tile by partitioning the output tensor into output tensor segments. The workload of computing an individual output tensor segment can be assigned to an individual PE in the tile.
-
公开(公告)号:EP4343565A1
公开(公告)日:2024-03-27
申请号:EP23183425.0
申请日:2023-07-04
申请人: INTEL Corporation
发明人: SUNG, Raymond Jit-Hung , MATHAIKUTTY, Deepak Abraham , AGARWAL, Amit , BERNARD, David Thomas , HSU, Steven , POWER, Martin , BYRNE, Conor , RAHA, Arnab
IPC分类号: G06F15/167 , G06N3/0464 , G06N3/063 , G06N3/084 , G11C11/54
摘要: A memory array of a compute tile may store activations or weights of a DNN. The memory array may include databanks for storing contexts, context MUXs, and byte MUXs. A databank may store a context with flip-flop arrays, each of which includes a sequence of flip-flops. A logic gate and an ICG unit may gate flip-flops and control whether states of the flip-flops can be changed. The data gating can prevent a context not selected for the databank from inadvertently toggling and wasting power A context MUX may read a context from different flip-flop arrays in a databank based on gray-coded addresses. A byte MUX can combine bits from different bytes in a context read by the context MUX. The memory array may be implemented with bit packing to reduce distance between the context MUX and byte MUX to reduce lengths of wires connecting the context MUXs and byte MUXs.
-
3.
公开(公告)号:EP4437456A1
公开(公告)日:2024-10-02
申请号:EP22899259.0
申请日:2022-10-14
申请人: Intel Corporation
发明人: RAHA, Arnab , MOHAPATRA, Debabrata , MATHAIKUTTY, Deepak Abraham , SUNG, Raymond Jit-Hung , BRICK, Cormac Michael
CPC分类号: G06F2207/482420130101 , G06F7/76 , G06F7/5443 , G06N3/08 , G06N3/063 , G06N3/048 , G06N3/045
-
公开(公告)号:EP4357978A1
公开(公告)日:2024-04-24
申请号:EP23191148.8
申请日:2023-08-11
申请人: INTEL Corporation
发明人: MATHAIKUTTY, Deepak Abraham , RAHA, Arnab , SUNG, Raymond Jit-Hung , POWER, Martin , CHEEMA, Umer Iftikhar , BERNARD, David Thomas
IPC分类号: G06N3/0464 , G06N3/048 , G06N3/0495 , G06N3/063
CPC分类号: G06N3/0464 , G06N3/0495 , G06N3/063 , G06N3/048
摘要: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.
-
公开(公告)号:EP4345655A1
公开(公告)日:2024-04-03
申请号:EP23190177.8
申请日:2023-08-08
申请人: Intel Corporation
发明人: BOYD, Richard , BERNARD, David Thomas , MATHAIKUTTY, Deepak Abraham , POWER, Martin , HANRAHAN, Niall
IPC分类号: G06F17/16 , G06N3/045 , G06N3/0464 , G06N3/063 , G06F7/544
摘要: An DNN accelerator may perform 1 × N kernel decomposition to decompose a convolutional kernel into kernel vectors, each of which includes multiple weights. Through the kernel decomposition, a weight operand may be generated from a filter. The DNN accelerator converts an input tensor into input operands. An input operand includes activations and has the same size as the weight operand. The DNN accelerator may read a first activation in the input operand from memory to an internal memory of a first PE and read a second activation in the input operand from the memory to an internal memory of a second PE. The first PE may receive the second activation from the second PE through activation broadcasting between the two PEs and perform MAC operations on the input operand and weight operand. The second PE may perform MAC operations on another input operand in the input tensor and the weight operand.
-
公开(公告)号:EP4433897A1
公开(公告)日:2024-09-25
申请号:EP22896300.5
申请日:2022-10-14
申请人: Intel Corporation
发明人: MOHAPATRA, Debabrata , RAHA, Arnab , MATHAIKUTTY, Deepak Abraham , SUNG, Raymond Jit-Hung , BRICK, Cormac Michael
CPC分类号: G06F7/5443 , G06F2207/482420130101 , G06F9/3012 , G06N3/063 , G06N3/084 , G06N3/048 , G06N3/045
-
公开(公告)号:EP4354348A1
公开(公告)日:2024-04-17
申请号:EP23190370.9
申请日:2023-08-08
申请人: Intel Corporation
发明人: POWER, Martin , BYRNE, Conor , HANRAHAN, Niall , MATHAIKUTTY, Deepak Abraham , RAHA, Arnab , SUNG, Raymond Jit-Hung , BERNARD, David Thomas , BRADY, Kevin , GRYMEL, Martin-Thomas
IPC分类号: G06N3/0495 , G06N3/063 , G06N3/0464
CPC分类号: G06N3/0464 , G06N3/0495 , G06N3/063
摘要: Sparsity processing within a compute block can be done on unpacked data. The compute block includes a sparsity decoder that generates a combined sparsity vector from an activation sparsity vector and a weight sparsity vector. The activation sparsity vector indicates positions of non-zero valued activations in an activation context. The weight sparsity vector indicates positions of non-zero valued weights in a weight context. The combined sparsity vector comprises one or more zero valued bits and one or more non-zero valued bits. The sparsity decoder may determine the position of a non-zero valued bit in the combined sparsity vector and determine an address for the non-zero valued activation and the non-zero valued weight based on the position of the non-zero valued bit. The non-zero valued activation and the non-zero valued weight may be provided to a PE for performing MAC operations.
-
公开(公告)号:EP4343635A1
公开(公告)日:2024-03-27
申请号:EP23186375.4
申请日:2023-07-19
申请人: INTEL Corporation
发明人: KADRI, Sudheendra , CREWS, Darren , MATHAIKUTTY, Deepak Abraham , DEIDDA, Andrea , RAHA, Arnab , BRADY, Kevin , BERNARD, David Thomas
IPC分类号: G06N3/063 , G06N3/0464 , G06F13/28 , G06N3/09
摘要: An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).
-
-
-
-
-
-
-