-
公开(公告)号:US20240119269A1
公开(公告)日:2024-04-11
申请号:US18543356
申请日:2023-12-18
IPC分类号: G06N3/048
CPC分类号: G06N3/048
摘要: A deep neural network (DNN) accelerator may facilitate dynamic sparsity-based acceleration and operate in various sparsity modes including a combined sparsity mode, a weight sparsity mode, an activation sparsity mode, and a dense mode. The DNN accelerator may receive a configuration parameter indicating whether to accelerate the layer based on sparsity in a weight tensor of the layer. The configuration parameter may be generated offline, e.g., before the execution of the DNN is started. The DNN accelerator computes one or more activations of the layer in a previous layer in the DNN. The one or more activations are one or more elements of an activation tensor of the layer. The DNN accelerator may determine a sparsity mode for the layer based on the configuration parameter and sparsity in the activation tensor. One or more sparse cells in the DNN accelerator may execute the layer in the sparsity mode.
-
公开(公告)号:US20240028895A1
公开(公告)日:2024-01-25
申请号:US18476594
申请日:2023-09-28
申请人: Arnab Raha , Deepak Abraham Mathaikutty , Dinakar Kondru , Umer Iftikhar Cheema , Martin Power , Niall Hanrahan
发明人: Arnab Raha , Deepak Abraham Mathaikutty , Dinakar Kondru , Umer Iftikhar Cheema , Martin Power , Niall Hanrahan
IPC分类号: G06N3/08 , G06N3/0464
CPC分类号: G06N3/08 , G06N3/0464
摘要: A load module in a deep neural network (DNN) accelerator may receive a configuration parameter indicating a selection between an activation sparsity mode and a weight sparsity mode. The load module may read a sparse activation tensor, an activation sparsity bitmap, a sparse weight tensor, and a weight sparsity bitmap from a memory. The load module may densify one of the compressed tensors based on the sparsity mode and leave the other compressed tensor as is. The load module may load the dense tensor and the sparse tensor to a sparse cell. The sparse cell includes a sparsity module that may select one or more elements of the dense tensor based on the sparsity bitmap of the sparse tensor. The sparse cell also includes multiply-accumulate (MAC) units that perform MAC operation on the selected elements and the sparse tensor. MAC operations on unselected elements of the dense tensor are skipped.
-
公开(公告)号:US20230221994A1
公开(公告)日:2023-07-13
申请号:US18184921
申请日:2023-03-16
申请人: Arnab Raha , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Umer Iftikhar Cheema , Dinakar Kondru , Soumendu Kumar Ghosh
发明人: Arnab Raha , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Umer Iftikhar Cheema , Dinakar Kondru , Soumendu Kumar Ghosh
CPC分类号: G06F9/5027 , G06F9/54
摘要: A compute block can dynamically uncompress compressed data for executing a channel-separable operation. The compressed data includes one or more nonzero-valued data elements. The compressed data may be stored in a datastore along with a sparsity bitmap of an input operand including the compressed data. An uncompressing module may determine whether the input operand includes any zero-valued data element, e.g., by determining whether the sparsity bitmap includes a zero-valued bit. After determining that the sparsity bitmap includes a zero-valued bit, the uncompressing module inserts a zero-valued data element into the compressed data based on a position of the bit in the sparsity bitmap and generates uncompressed data and update the sparsity bitmap so that all the bits become ones. The uncompressed dense data is transmitted to one or more processing elements (PE) in the compute block for computing an output operand based on the uncompressed dense data.
-
公开(公告)号:US20230351181A1
公开(公告)日:2023-11-02
申请号:US18346992
申请日:2023-07-05
申请人: Umer Iftikhar Cheema , Deepak Abraham Mathaikutty , Arnab Raha , Dinakar Kondru , Raymond Jit-Hung Sung , Soumendu Kumar Ghosh
发明人: Umer Iftikhar Cheema , Deepak Abraham Mathaikutty , Arnab Raha , Dinakar Kondru , Raymond Jit-Hung Sung , Soumendu Kumar Ghosh
摘要: An activation function unit can compute activation functions approximated by Taylor series. The activation function unit may include a plurality of compute elements. Each compute element may include two multipliers and an accumulator. The first multiplier may compute intermediate products using an activation, such as an output activation of a DNN layer. The second multiplier may compute terms of Taylor series approximating an activation function based on the intermediate products from the first multiplier and coefficients of the Taylor series. The accumulator may compute a partial sum of the terms as an output of the activation function. The number of the terms may be determined based on a predetermined accuracy of the output of the activation function. The activation function unit may process multiple activations. Different activations may be input into different compute elements in different clock cycles. The activation function unit may compute activation functions with different accuracies.
-
-
-