-
公开(公告)号:US12131258B2
公开(公告)日:2024-10-29
申请号:US17030315
申请日:2020-09-23
Applicant: QUALCOMM Incorporated
Inventor: Yadong Lu , Ying Wang , Tijmen Pieter Frederik Blankevoort , Christos Louizos , Matthias Reisser , Jilei Hou
Abstract: A method for compressing a deep neural network includes determining a pruning ratio for a channel and a mixed-precision quantization bit-width based on an operational budget of a device implementing the deep neural network. The method further includes quantizing a weight parameter of the deep neural network and/or an activation parameter of the deep neural network based on the quantization bit-width. The method also includes pruning the channel of the deep neural network based on the pruning ratio.
-
公开(公告)号:US11790241B2
公开(公告)日:2023-10-17
申请号:US17016130
申请日:2020-09-09
Applicant: QUALCOMM Incorporated
Inventor: Matthias Reisser , Saurabh Kedar Pitre , Xiaochun Zhu , Edward Harrison Teague , Zhongze Wang , Max Welling
Abstract: In one embodiment, a method of simulating an operation of an artificial neural network on a binary neural network processor includes receiving a binary input vector for a layer including a probabilistic binary weight matrix and performing vector-matrix multiplication of the input vector with the probabilistic binary weight matrix, wherein the multiplication results are modified by simulated binary-neural-processing hardware noise, to generate a binary output vector, where the simulation is performed in the forward pass of a training algorithm for a neural network model for the binary-neural-processing hardware.
-
公开(公告)号:US11562208B2
公开(公告)日:2023-01-24
申请号:US16413535
申请日:2019-05-15
Applicant: QUALCOMM Incorporated
Abstract: A method for quantizing a neural network includes modeling noise of parameters of the neural network. The method also includes assigning grid values to each realization of the parameters according to a concrete distribution that depends on a local fixed-point quantization grid and the modeled noise and. The method further includes computing a fixed-point value representing parameters of a hard fixed-point quantized neural network.
-
-