LEARNED STEP SIZE QUANTIZATION
    1.
    发明申请

    公开(公告)号:US20210264279A1

    公开(公告)日:2021-08-26

    申请号:US16796397

    申请日:2020-02-20

    IPC分类号: G06N3/08 G06F17/16 G06N3/04

    摘要: Learned step size quantization in artificial neural network is provided. In various embodiments, a system comprises an artificial neural network and a computing node. The artificial neural network comprises: a quantizer having a configurable step size, the quantizer adapted to receive a plurality of input values and quantize the plurality of input values according to the configurable step size to produce a plurality of quantized input values, at least one matrix multiplier configured to receive the plurality of quantized input values from the quantizer and to apply a plurality of weights to the quantized input values to determine a plurality of output values having a first precision, and a multiplier configured to scale the output values to a second precision. The computing node is operatively coupled to the artificial neural network and is configured to: provide training input data to the artificial neural network, and optimize the configurable step size based on a gradient through the quantizer and the training input data.

    Learned step size quantization
    4.
    发明授权

    公开(公告)号:US11823054B2

    公开(公告)日:2023-11-21

    申请号:US16796397

    申请日:2020-02-20

    摘要: Learned step size quantization in artificial neural network is provided. In various embodiments, a system comprises an artificial neural network and a computing node. The artificial neural network comprises: a quantizer having a configurable step size, the quantizer adapted to receive a plurality of input values and quantize the plurality of input values according to the configurable step size to produce a plurality of quantized input values, at least one matrix multiplier configured to receive the plurality of quantized input values from the quantizer and to apply a plurality of weights to the quantized input values to determine a plurality of output values having a first precision, and a multiplier configured to scale the output values to a second precision. The computing node is operatively coupled to the artificial neural network and is configured to: provide training input data to the artificial neural network, and optimize the configurable step size based on a gradient through the quantizer and the training input data.

    COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS

    公开(公告)号:US20210209450A1

    公开(公告)日:2021-07-08

    申请号:US16733393

    申请日:2020-01-03

    IPC分类号: G06N3/063

    摘要: A neural inference chip includes a global weight memory; a neural core; and a network connecting the global weight memory to the at least one neural core. The neural core comprises a local weight memory. The local weight memory comprises a plurality of memory banks. Each of the plurality of memory banks is uniquely addressable by at least one index. The neural inference chip is adapted to store in the global weight memory a compressed weight block comprising at least one compressed weight matrix. The neural inference chip is adapted to transmit the compressed weight block from the global weight memory to the core via the network. The core is adapted to decode the at least one compressed weight matrix into a decoded weight matrix and store the decoded weight matrix in its local weight memory. The at core is adapted to apply the decoded weight matrix to a plurality of input activations to produce a plurality of output activations.