-
公开(公告)号:WO2022169497A1
公开(公告)日:2022-08-11
申请号:PCT/US2021/060710
申请日:2021-11-24
Applicant: QUALCOMM INCORPORATED
Inventor: SRINIVAS, Suraj , BLANKEVOORT, Tijmen Pieter Frederik , KUZMIN, Andrey , NAGEL, Markus , VAN BAALEN, Marinus Willem , SKLIAR, Andrii
Abstract: Various embodiments include methods and devices for neural network pruning. Embodiments may include receiving as an input a weight tensor for a neural network, increasing a level of sparsity of the weight tensor generating a sparse weight tensor, updating the neural network using the sparse weight tensor generating an updated weight tensor, decreasing a level of sparsity of the updated weight tensor generating a dense weight tensor, increasing the level of sparsity of the dense weight tensor the dense weight tensor generating a final sparse weight tensor, and using the neural network with the final sparse weight tensor to generate inferences. Some embodiments may include increasing a level of sparsity of a first sparse weight tensor generating a second sparse weight tensor, updating the neural network using the second sparse weight tensor generating a second updated weight tensor, and decreasing the level of sparsity the second updated weight tensor.
-
公开(公告)号:WO2023059723A1
公开(公告)日:2023-04-13
申请号:PCT/US2022/045785
申请日:2022-10-05
Applicant: QUALCOMM INCORPORATED
Inventor: KUZMIN, Andrey , VAN BAALEN, Marinus Willem , NAGEL, Markus , BEHBOODI, Arash
Abstract: A processor-implemented method includes retrieving, for a layer of a set of layers of an artificial neural network (ANN), a dense quantized matrix representing a codebook and a sparse quantized matrix representing linear coefficients. The dense quantized matrix and the sparse quantized matrix may be associated with a weight tensor of the layer. The processor-implemented method also includes determining, for the layer of the set of layers, the weight tensor based on a product of the dense quantized matrix and the sparse quantized matrix. The processor-implemented method further includes processing, at the layer, an input based on the weight tensor.
-