Jointly pruning and quantizing deep neural networks
Abstract:
A system and a method generate a neural network that includes at least one layer having weights and output feature maps that have been jointly pruned and quantized. The weights of the layer are pruned using an analytic threshold function. Each weight remaining after pruning is quantized based on a weighted average of a quantization and dequantization of the weight for all quantization levels to form quantized weights for the layer. Output feature maps of the layer are generated based on the quantized weights of the layer. Each output feature map of the layer is quantized based on a weighted average of a quantization and dequantization of the output feature map for all quantization levels. Parameters of the analytic threshold function, the weighted average of all quantization levels of the weights and the weighted average of each output feature map of the layer are updated using a cost function.
Public/Granted literature
Information query
Patent Agency Ranking
0/0