-
公开(公告)号:US11604987B2
公开(公告)日:2023-03-14
申请号:US16826472
申请日:2020-03-23
Applicant: QUALCOMM Incorporated
Abstract: Various embodiments include methods and neural network computing devices implementing the methods, for generating an approximation neural network. Various embodiments may include performing approximation operations on a weights tensor associated with a layer of a neural network to generate an approximation weights tensor, determining an expected output error of the layer in the neural network due to the approximation weights tensor, subtracting the expected output error from a bias parameter of the layer to determine an adjusted bias parameter and substituting the adjusted bias parameter for the bias parameter in the layer. Such operations may be performed for one or more layers in a neural network to produce an approximation version of the neural network for execution on a resource limited processor.
-
公开(公告)号:US11704571B2
公开(公告)日:2023-07-18
申请号:US17067233
申请日:2020-10-09
Applicant: QUALCOMM Incorporated
Inventor: Kambiz Azarian Yazdi , Tijmen Pieter Frederik Blankevoort , Jin Won Lee , Yash Sanjay Bhalgat
Abstract: A method for pruning weights of an artificial neural network based on a learned threshold includes determining a pruning threshold for pruning a first set of pre-trained weights of multiple pre-trained weights based on a function of a classification loss and a regularization loss. Weights are pruned from the first set of pre-trained weights when a first value of the weight is less than the pruning threshold. A second set of pre-trained weights of the multiple pre-trained weights is fine-tuned or adjusted in response to a second value of each pre-trained weight in the second set of pre-trained weights being greater than the pruning threshold.
-
公开(公告)号:US20200372361A1
公开(公告)日:2020-11-26
申请号:US16419509
申请日:2019-05-22
Applicant: QUALCOMM Incorporated
Abstract: A computing device may be equipped with a generalized framework for accomplishing conditional computation or gating in a neural network. The computing device may receive input in a neural network layer that includes two or more filters. The computing device may intelligently determine whether the two or more filters are relevant to the received input. The computing device may deactivate filters that are determined not to be relevant to the received input (or activate filters that are determined to be relevant to the received input), and apply the received input to active filters in the layer to generate an activation.
-
公开(公告)号:US12271800B2
公开(公告)日:2025-04-08
申请号:US17097811
申请日:2020-11-13
Applicant: QUALCOMM Incorporated
Inventor: Davide Abati , Babak Ehteshami Bejnordi , Jakub Mikolaj Tomczak , Tijmen Pieter Frederik Blankevoort
IPC: G06N3/02 , G06N3/04 , G06N3/0464 , G06N3/048 , G06N3/0495 , G06N3/08 , G06N20/10 , G06F18/20 , G06N3/063
Abstract: Various aspects provide methods for learning, such as continual learning, that support task-incremental learning using a multi-head classification architecture. Various aspects may enable conditional computing to support multi-head classification. Various aspects provide methods for learning, such as continual learning, that support class-incremental learning using a single-head classification architecture. Various aspects may enable conditional computing to support single-head classification by predicting the task associated with a given test input and selecting an associated classification head based at least in part on the task prediction.
-
公开(公告)号:US12131258B2
公开(公告)日:2024-10-29
申请号:US17030315
申请日:2020-09-23
Applicant: QUALCOMM Incorporated
Inventor: Yadong Lu , Ying Wang , Tijmen Pieter Frederik Blankevoort , Christos Louizos , Matthias Reisser , Jilei Hou
Abstract: A method for compressing a deep neural network includes determining a pruning ratio for a channel and a mixed-precision quantization bit-width based on an operational budget of a device implementing the deep neural network. The method further includes quantizing a weight parameter of the deep neural network and/or an activation parameter of the deep neural network based on the quantization bit-width. The method also includes pruning the channel of the deep neural network based on the pruning ratio.
-
公开(公告)号:US12242956B2
公开(公告)日:2025-03-04
申请号:US16826524
申请日:2020-03-23
Applicant: QUALCOMM Incorporated
Abstract: Various embodiments include methods and neural network computing devices implementing the methods for performing quantization in neural networks. Various embodiments may include equalizing ranges of weight tensors or output channel weights within a first layer of the neural network by scaling each of the output channel weights of the first layer by a corresponding scaling factor, and scaling each of a second adjacent layer's corresponding input channel weights by applying an inverse of the corresponding scaling factor to the input channel weights. The corresponding scaling factor may be determined using a black-box optimizer on a quantization error metric or based on heuristics, equalization of dynamic ranges, equalization of range extrema (minima or maxima), differential learning using straight through estimator (STE) methods and a local or global loss, or using an error metric for the quantization error and a black-box optimizer that minimizes the error metric with respect to the scaling.
-
公开(公告)号:US11562208B2
公开(公告)日:2023-01-24
申请号:US16413535
申请日:2019-05-15
Applicant: QUALCOMM Incorporated
Abstract: A method for quantizing a neural network includes modeling noise of parameters of the neural network. The method also includes assigning grid values to each realization of the parameters according to a concrete distribution that depends on a local fixed-point quantization grid and the modeled noise and. The method further includes computing a fixed-point value representing parameters of a hard fixed-point quantized neural network.
-
-
-
-
-
-