-
公开(公告)号:US20200234089A1
公开(公告)日:2020-07-23
申请号:US16844572
申请日:2020-04-09
Applicant: Samsung Electronics Co., Ltd.
Inventor: Georgios Georgiadis , Weiran Deng
Abstract: A system and method for pruning. A neural network includes a plurality of long short-term memory cells, each of which includes an input having a weight matrix Wc, an input gate having a weight matrix Wi, a forget gate having a weight matrix Wf, and an output gate having a weight matrix Wo. In some embodiments, after initial training, one or more of the weight matrices Wi, Wf, and Wo are pruned, and the weight matrix Wc is left unchanged. The neural network is then retrained, the pruned weights being constrained to remain zero during retraining.
-
公开(公告)号:US11775611B2
公开(公告)日:2023-10-03
申请号:US16816247
申请日:2020-03-11
Applicant: Samsung Electronics Co., Ltd.
Inventor: Jun Fang , Joseph H. Hassoun , Ali Shafiee Ardestani , Hamzah Ahmed Ali Abdelaziz , Georgios Georgiadis , Hui Chen , David Philip Lloyd Thorsley
Abstract: In some embodiments, a method of quantizing an artificial neural network includes dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. In some embodiments, linear or nonlinear quantization are applied to values of the tensor in the first region and the second region. In some embodiments, the method includes locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. In some embodiments, the expected quantization error is minimized by solving analytically and/or searching numerically.
-
公开(公告)号:US11475308B2
公开(公告)日:2022-10-18
申请号:US16396619
申请日:2019-04-26
Applicant: Samsung Electronics Co., Ltd.
Inventor: Georgios Georgiadis , Weiran Deng
Abstract: A system and a method generate a neural network that includes at least one layer having weights and output feature maps that have been jointly pruned and quantized. The weights of the layer are pruned using an analytic threshold function. Each weight remaining after pruning is quantized based on a weighted average of a quantization and dequantization of the weight for all quantization levels to form quantized weights for the layer. Output feature maps of the layer are generated based on the quantized weights of the layer. Each output feature map of the layer is quantized based on a weighted average of a quantization and dequantization of the output feature map for all quantization levels. Parameters of the analytic threshold function, the weighted average of all quantization levels of the weights and the weighted average of each output feature map of the layer are updated using a cost function.
-
公开(公告)号:US11250325B2
公开(公告)日:2022-02-15
申请号:US15894921
申请日:2018-02-12
Applicant: Samsung Electronics Co., Ltd.
Inventor: Weiran Deng , Georgios Georgiadis
Abstract: A technique to prune weights of a neural network using an analytic threshold function h(w) provides a neural network having weights that have been optimally pruned. The neural network includes a plurality of layers in which each layer includes a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof. Each set of weights is based on a cost function C that has been minimized by back-propagating an output of the neural network in response to input training data. The cost function C is also minimized based on a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w) and on a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w).
-
公开(公告)号:US10657426B2
公开(公告)日:2020-05-19
申请号:US15937558
申请日:2018-03-27
Applicant: Samsung Electronics Co., Ltd.
Inventor: Georgios Georgiadis , Weiran Deng
Abstract: A system and method for pruning. A neural network includes a plurality of long short-term memory cells, each of which includes an input having a weight matrix Wc, an input gate having a weight matrix Wi, a forget gate having a weight matrix Wf, and an output gate having a weight matrix Wo. In some embodiments, after initial training, one or more of the weight matrices Wi, Wf, and Wo are pruned, and the weight matrix Wc is left unchanged. The neural network is then retrained, the pruned weights being constrained to remain zero during retraining.
-
公开(公告)号:US20190228274A1
公开(公告)日:2019-07-25
申请号:US15937558
申请日:2018-03-27
Applicant: Samsung Electronics Co., Ltd.
Inventor: Georgios Georgiadis , Weiran Deng
Abstract: A system and method for pruning. A neural network includes a plurality of long short-term memory cells, each of which includes an input having a weight matrix Wc, an input gate having a weight matrix Wi, a forget gate having a weight matrix Wf, and an output gate having a weight matrix Wo. In some embodiments, after initial training, one or more of the weight matrices Wi, Wf, and Wo are pruned, and the weight matrix Wc is left unchanged. The neural network is then retrained, the pruned weights being constrained to remain zero during retraining.
-
公开(公告)号:US11588499B2
公开(公告)日:2023-02-21
申请号:US16223105
申请日:2018-12-17
Applicant: Samsung Electronics Co., Ltd.
Inventor: Georgios Georgiadis
Abstract: A system and a method provide compression and decompression of weights of a layer of a neural network. For compression, the values of the weights are pruned and the weights of a layer are configured as a tensor having a tensor size of H×W×C in which H represents a height of the tensor, W represents a width of the tensor, and C represents a number of channels of the tensor. The tensor is formatted into at least one block of values. Each block is encoded independently from other blocks of the tensor using at least one lossless compression mode. For decoding, each block is decoded independently from other blocks using at least one decompression mode corresponding to the at least one compression mode used to compress the block; and deformatted into a tensor having the size of H×W×C.
-
公开(公告)号:US11151428B2
公开(公告)日:2021-10-19
申请号:US16844572
申请日:2020-04-09
Applicant: Samsung Electronics Co., Ltd.
Inventor: Georgios Georgiadis , Weiran Deng
Abstract: A system and method for pruning. A neural network includes a plurality of long short-term memory cells, each of which includes an input having a weight matrix Wc, an input gate having a weight matrix Wi, a forget gate having a weight matrix Wf, and an output gate having a weight matrix Wo. In some embodiments, after initial training, one or more of the weight matrices Wi, Wf, and Wo are pruned, and the weight matrix Wc is left unchanged. The neural network is then retrained, the pruned weights being constrained to remain zero during retraining.
-
-
-
-
-
-
-