-
公开(公告)号:US20210264279A1
公开(公告)日:2021-08-26
申请号:US16796397
申请日:2020-02-20
发明人: Steve Esser , Jeffrey L. McKinstry , Deepika Bablani , Rathinakumar Appuswamy , Dharmendra S. Modha
摘要: Learned step size quantization in artificial neural network is provided. In various embodiments, a system comprises an artificial neural network and a computing node. The artificial neural network comprises: a quantizer having a configurable step size, the quantizer adapted to receive a plurality of input values and quantize the plurality of input values according to the configurable step size to produce a plurality of quantized input values, at least one matrix multiplier configured to receive the plurality of quantized input values from the quantizer and to apply a plurality of weights to the quantized input values to determine a plurality of output values having a first precision, and a multiplier configured to scale the output values to a second precision. The computing node is operatively coupled to the artificial neural network and is configured to: provide training input data to the artificial neural network, and optimize the configurable step size based on a gradient through the quantizer and the training input data.
-
公开(公告)号:US20210174176A1
公开(公告)日:2021-06-10
申请号:US16705565
申请日:2019-12-06
发明人: Andrew S. Cassidy , Rathinakumar Appuswamy , John V. Arthur , Pallab Datta , Steve Esser , Myron D. Flickner , Jeffrey McKinstry , Dharmendra S. Modha , Jun Sawada , Brian Taba
摘要: Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.
-
公开(公告)号:US11537859B2
公开(公告)日:2022-12-27
申请号:US16705565
申请日:2019-12-06
发明人: Andrew S. Cassidy , Rathinakumar Appuswamy , John V. Arthur , Pallab Datta , Steve Esser , Myron D. Flickner , Jeffrey McKinstry , Dharmendra S. Modha , Jun Sawada , Brian Taba
摘要: Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.
-
公开(公告)号:US11823054B2
公开(公告)日:2023-11-21
申请号:US16796397
申请日:2020-02-20
发明人: Steve Esser , Jeffrey L. McKinstry , Deepika Bablani , Rathinakumar Appuswamy , Dharmendra S. Modha
摘要: Learned step size quantization in artificial neural network is provided. In various embodiments, a system comprises an artificial neural network and a computing node. The artificial neural network comprises: a quantizer having a configurable step size, the quantizer adapted to receive a plurality of input values and quantize the plurality of input values according to the configurable step size to produce a plurality of quantized input values, at least one matrix multiplier configured to receive the plurality of quantized input values from the quantizer and to apply a plurality of weights to the quantized input values to determine a plurality of output values having a first precision, and a multiplier configured to scale the output values to a second precision. The computing node is operatively coupled to the artificial neural network and is configured to: provide training input data to the artificial neural network, and optimize the configurable step size based on a gradient through the quantizer and the training input data.
-
公开(公告)号:US20210209450A1
公开(公告)日:2021-07-08
申请号:US16733393
申请日:2020-01-03
发明人: Andrew S. Cassidy , Rathinakumar Appuswamy , John V. Arthur , Pallab Datta , Steve Esser , Myron D. Flickner , Dharmendra S. Modha , Jun Sawada
IPC分类号: G06N3/063
摘要: A neural inference chip includes a global weight memory; a neural core; and a network connecting the global weight memory to the at least one neural core. The neural core comprises a local weight memory. The local weight memory comprises a plurality of memory banks. Each of the plurality of memory banks is uniquely addressable by at least one index. The neural inference chip is adapted to store in the global weight memory a compressed weight block comprising at least one compressed weight matrix. The neural inference chip is adapted to transmit the compressed weight block from the global weight memory to the core via the network. The core is adapted to decode the at least one compressed weight matrix into a decoded weight matrix and store the decoded weight matrix in its local weight memory. The at core is adapted to apply the decoded weight matrix to a plurality of input activations to produce a plurality of output activations.
-
-
-
-