FOUR-BIT TRAINING FOR MACHINE LEARNING

    公开(公告)号:US20220180171A1

    公开(公告)日:2022-06-09

    申请号:US17112528

    申请日:2020-12-04

    摘要: An apparatus includes a floating-point gradient register; an integer register; a memory bank; and an array of processing units. Each of the units includes a plurality of binary shifters having an integer input configured to obtain corresponding bits of a 4-bit integer multiplicand, and a shift-specifying input configured to obtain corresponding bits in an exponent field of a 4-bit floating point multiplier. The multiplier is specified in a mantissaless four-bit floating point format including a sign bit, three exponent bits, and no mantissa bits. An adder tree has a plurality of inputs coupled to outputs of the plurality of shifters, and a rounder has an input coupled to an output of the adder tree. The integer inputs are connected to the integer register; the shift-specifying inputs are connected to the floating-point gradient register; and outputs of the rounders are coupled to the memory bank.

    Robust gradient weight compression schemes for deep learning applications

    公开(公告)号:US11295208B2

    公开(公告)日:2022-04-05

    申请号:US15830170

    申请日:2017-12-04

    IPC分类号: G06N3/08 G06N3/04

    摘要: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.

    NEURAL NETWORK CIRCUITRY HAVING FLOATING POINT FORMAT WITH ASYMMETRIC RANGE

    公开(公告)号:US20210064976A1

    公开(公告)日:2021-03-04

    申请号:US16558554

    申请日:2019-09-03

    IPC分类号: G06N3/063 G06N3/08

    摘要: An apparatus includes circuitry for a neural network that is configured to perform forward propagation neural network operations on floating point numbers having a first n-bit floating point format. The first n-bit floating point format has a configuration consisting of a sign bit, m exponent bits and p mantissa bits where m is greater than p. The circuitry is further configured to perform backward propagation neural network operations on floating point numbers having a second n-bit floating point format that is different than the first n-bit floating point format. The second n-bit floating point format has a configuration consisting of a sign bit, q exponent bits and r mantissa bits where q is greater than m and r is less than p.

    ROBUST GRADIENT WEIGHT COMPRESSION SCHEMES FOR DEEP LEARNING APPLICATIONS

    公开(公告)号:US20190171935A1

    公开(公告)日:2019-06-06

    申请号:US15830170

    申请日:2017-12-04

    IPC分类号: G06N3/08 G06N3/04

    摘要: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.