Neural network training with decreased memory consumption and processor utilization

    公开(公告)号:US11526761B2

    公开(公告)日:2022-12-13

    申请号:US16550229

    申请日:2019-08-24

    Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.

    Adjusting activation compression for neural network training

    公开(公告)号:US12165038B2

    公开(公告)日:2024-12-10

    申请号:US16276395

    申请日:2019-02-14

    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.

    NEURAL NETWORK LAYER PROCESSING WITH SCALED QUANTIZATION

    公开(公告)号:US20200272881A1

    公开(公告)日:2020-08-27

    申请号:US16284407

    申请日:2019-02-25

    Inventor: Daniel Lo

    Abstract: Processors and methods for neural network processing are provided. A method includes receiving a subset of data corresponding to a layer of a neural network. The method further includes prior to performing any matrix operations using the subset of the data, scaling the subset of the data by a scaling factor to generate a scaled subset of data. The method further includes quantizing the scaled subset of the data to generate a scaled and quantized subset of data. The method further includes performing the matrix operations using the scaled and quantized subset of the data to generate a subset of results of the matrix operations. The method further includes descaling the subset of the results of the matrix operations, by multiplying the subset of the results of the matrix operations with an inverse of the scaling factor, to generate a descaled subset of results of the matrix operations.

    ADJUSTING PRECISION AND TOPOLOGY PARAMETERS FOR NEURAL NETWORK TRAINING BASED ON A PERFORMANCE METRIC

    公开(公告)号:US20200210840A1

    公开(公告)日:2020-07-02

    申请号:US16237308

    申请日:2018-12-31

    Abstract: Apparatus and methods for training neural networks based on a performance metric, including adjusting numerical precision and topology as training progresses are disclosed. In some examples, block floating-point formats having relatively lower accuracy are used during early stages of training. Accuracy of the floating-point format can be increased as training progresses based on a determined performance metric. In some examples, values for the neural network are transformed to normal precision floating-point formats. The performance metric can be determined based on entropy of values for the neural network, accuracy of the neural network, or by other suitable techniques. Accelerator hardware can be used to implement certain implementations, including hardware having direct support for block floating-point formats.

    NEURAL NETWORK ACTIVATION COMPRESSION WITH OUTLIER BLOCK FLOATING-POINT

    公开(公告)号:US20200210839A1

    公开(公告)日:2020-07-02

    申请号:US16237202

    申请日:2018-12-31

    Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

    Block floating point computations using reduced bit-width vectors

    公开(公告)号:US10691413B2

    公开(公告)日:2020-06-23

    申请号:US15971904

    申请日:2018-05-04

    Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.

    Reduced Memory Nucleotide Sequence Comparison

    公开(公告)号:US20180137085A1

    公开(公告)日:2018-05-17

    申请号:US15351372

    申请日:2016-11-14

    CPC classification number: G06F17/16 G06F19/22

    Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuity that can implement a Smith Waterman analysis in a reduced memory footprint, storing and referencing only individual portions, or subsections, of a two-dimensional matrix that is representative of the comparison between the two nucleotide sequences. As the backtracking proceeds, backtracking metadata corresponding to a cell from a subsection that is not currently retained in memory can be required. Such a subsection can be regenerated from previously generated scores associated with checkpoint cells of the two-dimensional matrix that comprise two edges of the subsection being regenerated. Moreover, to further reduce memory consumption, the backtracking metadata stored for each cell can comprise four binary digits: two indicative of a directional assignment, one indicative of whether the corresponding cell is part of a deletion stretching across multiple contiguous cells, and one analogously indicative of insertions stretching across multiple contiguous cells.

Patent Agency Ranking