-
公开(公告)号:US20230306233A1
公开(公告)日:2023-09-28
申请号:US18103428
申请日:2023-01-30
Applicant: QUALCOMM Incorporated
Inventor: Marinus Willem VAN BAALEN , Brian KAHNE , Eric Wayne MAHURIN , Tijmen Pieter Frederik BLANKEVOORT , Andrey KUZMIN , Andrii SKLIAR , Markus NAGEL
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: A processor-implemented method includes bit shifting a binary representation of a neural network parameter. The neural network parameter has fewer bits, b, than a number of hardware bits, B, supported by hardware that processes the neural network parameter. The bit shifting effectively multiplies the neural network parameter by 2B-b. The method also includes dividing a quantization scale by 2B-b to obtain an updated quantization scale. The method further includes quantizing the bit shifted binary representation with the updated quantization scale to obtain a value for the neural network parameter.
-
公开(公告)号:US20230376272A1
公开(公告)日:2023-11-23
申请号:US18102582
申请日:2023-01-27
Applicant: QUALCOMM Incorporated
Inventor: Marinus Willem VAN BAALEN , Jorn Wilhelmus Timotheus PETERS , Markus NAGEL , Tijmen Pieter Frederik BLANKEVOORT , Andrey KUZMIN
Abstract: A processor-implemented method for fast floating point simulations with learnable parameters includes receiving a single precision input. An integer quantization process is performed on the input. Each element of the input is scaled based on a scaling parameter to generate an m-bit floating point output, where m is an integer.
-
公开(公告)号:US20220245457A1
公开(公告)日:2022-08-04
申请号:US17456318
申请日:2021-11-23
Applicant: QUALCOMM Incorporated
Inventor: Suraj SRINIVAS , Tijmen Pieter Frederik BLANKEVOORT , Andrey KUZMIN , Markus NAGEL , Marinus Willem VAN BAALEN , Andrii SKLIAR
Abstract: Various embodiments include methods and devices for neural network pruning. Embodiments may include receiving as an input a weight tensor for a neural network, increasing a level of sparsity of the weight tensor generating a sparse weight tensor, updating the neural network using the sparse weight tensor generating an updated weight tensor, decreasing a level of sparsity of the updated weight tensor generating a dense weight tensor, increasing the level of sparsity of the dense weight tensor the dense weight tensor generating a final sparse weight tensor, and using the neural network with the final sparse weight tensor to generate inferences. Some embodiments may include increasing a level of sparsity of a first sparse weight tensor generating a second sparse weight tensor, updating the neural network using the second sparse weight tensor generating a second updated weight tensor, and decreasing the level of sparsity the second updated weight tensor.
-
-