-
公开(公告)号:US20220230058A1
公开(公告)日:2022-07-21
申请号:US17713176
申请日:2022-04-04
Applicant: Qualcomm Incorporated
Inventor: Jinxia BAI , Rosario CAMMAROTA , Michael GOLDFARB
Abstract: A neural processing unit (NPU) is described. The NPU includes an NPU direct memory access (NDMA) core. The NDMA core includes a read engine having a read buffer. The NDMA core also includes a write engine having a write buffer. The NPU also includes a controller. The controller is configured to direct the NDMA core to perform hardware memory bandwidth optimization for reading/writing NDMA data in the read buffer and/or NDMA data in the write buffer. The NDMA core is also configured to transparently combine NDMA transaction requests for a data stripe to increase local access to available tensors in artificial neural networks.
-
公开(公告)号:US20230185532A1
公开(公告)日:2023-06-15
申请号:US18105159
申请日:2023-02-02
Applicant: QUALCOMM Incorporated
Inventor: Rexford Alan HILL , Aaron Douglass LAMB , Michael GOLDFARB , Amin ANSARI , Christopher LOTT
CPC classification number: G06F7/5443 , G06F5/06 , G06N3/063
Abstract: A method of exploiting activation sparsity in deep neural networks is described. The method includes retrieving an activation tensor and a weight tensor where the activation tensor is a sparse activation tensor. The method also includes generating a compressed activation tensor comprising non-zero activations of the activation tensor, where the compressed activation tensor has fewer columns than the activation tensor. The method further includes processing the compressed activation tensor and the weight tensor to generate an output tensor.
-
公开(公告)号:US20190325289A1
公开(公告)日:2019-10-24
申请号:US15956674
申请日:2018-04-18
Applicant: QUALCOMM Incorporated
Inventor: Rosario CAMMAROTA , Michael GOLDFARB , Manu RASTOGI , Sarang OZARDE
Abstract: An apparatus for optimizing a computational network is configure to receive an input at a first processing component. The first processing component may include at least a first programmable processing component and a second programmable processing component. The first programmable processing component is configured to compute a first nonlinear function and the second programmable processing component is configured to compute a second nonlinear function which is different than the second nonlinear function. The computational network which may be a recurrent neural network such as a long short-term memory may be operated to generate an inference based at least in part on outputs of the first programmable processing component and the second programmable processing component.
-
-