-
公开(公告)号:US11526761B2
公开(公告)日:2022-12-13
申请号:US16550229
申请日:2019-08-24
Applicant: Microsoft Technology Licensing LLC
Inventor: Taesik Na , Daniel Lo , Haishan Zhu , Eric Sen Chung
IPC: G06N3/08
Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
-
公开(公告)号:US12165038B2
公开(公告)日:2024-12-10
申请号:US16276395
申请日:2019-02-14
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo , Bita Darvish Rouhani , Eric S. Chung , Yiren Zhao , Amar Phanishayee , Ritchie Zhao
Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and, in particular, for adjusting floating-point formats used to store activation values during training. In certain examples of the disclosed technology, a computing system includes processors, memory, and a floating-point compressor in communication with the memory. The computing system is configured to produce a neural network comprising activation values expressed in a first floating-point format, select a second floating-point format for the neural network based on a performance metric, convert at least one of the activation values to the second floating-point format, and store the compressed activation values in the memory. Aspects of the second floating-point format that can be adjusted include the number of bits used to express mantissas, exponent format, use of non-uniform mantissas, and/or use of outlier values to express some of the mantissas.
-
公开(公告)号:US12045724B2
公开(公告)日:2024-07-23
申请号:US16237202
申请日:2018-12-31
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo , Amar Phanishayee , Eric S. Chung , Yiren Zhao , Ritchie Zhao
CPC classification number: G06N3/084 , G06F7/49915 , G06F9/30025 , G06F9/5027 , G06N5/046 , G06N20/00
Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
-
公开(公告)号:US11741362B2
公开(公告)日:2023-08-29
申请号:US15974637
申请日:2018-05-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo , Eric Sen Chung , Bita Darvish Rouhani
CPC classification number: G06N3/084 , G06F7/5443 , G06F18/214 , G06N3/082 , H03M7/24
Abstract: A system for training a neural network receives training data and performing lower precision format training calculations using lower precision format data at one or more training phases. One or more results from the lower precision format training calculations are converted to higher precision format data, and higher precision format training calculations are performed using the higher precision format data at one or more additional training phases. The neural network is modified using the results from the one or more additional training phases. The mixed precision format training calculations train the neural network more efficiently, while maintaining an overall accuracy.
-
公开(公告)号:US20230267319A1
公开(公告)日:2023-08-24
申请号:US18141272
申请日:2023-04-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Bita Darvish Rouhani , Taesik Na , Eric S. Chung , Daniel Lo , Douglas C. Burger
Abstract: Technology related to training a neural network accelerator using mixed precision data formats is disclosed. In one example of the disclosed technology, a neural network accelerator is configured to accelerate a given layer of a multi-layer neural network. An input tensor for the given layer can be converted from a normal-precision floating-point format to a quantized-precision floating-point format. A tensor operation can be performed using the converted input tensor. A result of the tensor operation can be converted from the block floating-point format to the normal-precision floating-point format. The converted result can be used to generate an output tensor of the layer of the neural network, where the output tensor is in normal-precision floating-point format.
-
公开(公告)号:US20200272881A1
公开(公告)日:2020-08-27
申请号:US16284407
申请日:2019-02-25
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo
Abstract: Processors and methods for neural network processing are provided. A method includes receiving a subset of data corresponding to a layer of a neural network. The method further includes prior to performing any matrix operations using the subset of the data, scaling the subset of the data by a scaling factor to generate a scaled subset of data. The method further includes quantizing the scaled subset of the data to generate a scaled and quantized subset of data. The method further includes performing the matrix operations using the scaled and quantized subset of the data to generate a subset of results of the matrix operations. The method further includes descaling the subset of the results of the matrix operations, by multiplying the subset of the results of the matrix operations with an inverse of the scaling factor, to generate a descaled subset of results of the matrix operations.
-
7.
公开(公告)号:US20200210840A1
公开(公告)日:2020-07-02
申请号:US16237308
申请日:2018-12-31
Applicant: Microsoft Technology Licensing, LLC
Inventor: Bita Darvish Rouhani , Eric S. Chung , Daniel Lo , Douglas C. Burger
Abstract: Apparatus and methods for training neural networks based on a performance metric, including adjusting numerical precision and topology as training progresses are disclosed. In some examples, block floating-point formats having relatively lower accuracy are used during early stages of training. Accuracy of the floating-point format can be increased as training progresses based on a determined performance metric. In some examples, values for the neural network are transformed to normal precision floating-point formats. The performance metric can be determined based on entropy of values for the neural network, accuracy of the neural network, or by other suitable techniques. Accelerator hardware can be used to implement certain implementations, including hardware having direct support for block floating-point formats.
-
公开(公告)号:US20200210839A1
公开(公告)日:2020-07-02
申请号:US16237202
申请日:2018-12-31
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo , Amar Phanishayee , Eric S. Chung , Yiren Zhao , Ritchie Zhao
Abstract: Apparatus and methods for training a neural network accelerator using quantized precision data formats having outlier values are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. Outlier values, comprising additional bits of mantissa and/or exponent are stored in ancillary storage for subset of the activation values. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
-
公开(公告)号:US10691413B2
公开(公告)日:2020-06-23
申请号:US15971904
申请日:2018-05-04
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo , Eric S. Chung , Douglas C. Burger
Abstract: A system for block floating point computation in a neural network receives a block floating point number comprising a mantissa portion. A bit-width of the block floating point number is reduced by decomposing the block floating point number into a plurality of numbers each having a mantissa portion with a bit-width that is smaller than a bit-width of the mantissa portion of the block floating point number. One or more dot product operations are performed separately on each of the plurality of numbers to obtain individual results, which are summed to generate a final dot product value. The final dot product value is used to implement the neural network. The reduced bit width computations allow higher precision mathematical operations to be performed on lower-precision processors with improved accuracy.
-
公开(公告)号:US20180137085A1
公开(公告)日:2018-05-17
申请号:US15351372
申请日:2016-11-14
Applicant: Microsoft Technology Licensing, LLC
Inventor: Daniel Lo , Eric Chung , Kalin Ovtcharov , Ravindra Pandya , David Heckerman , Roman Snytsar
IPC: G06F17/16
Abstract: Comparisons between two nucleotide sequences can be performed by customized integrated circuity that can implement a Smith Waterman analysis in a reduced memory footprint, storing and referencing only individual portions, or subsections, of a two-dimensional matrix that is representative of the comparison between the two nucleotide sequences. As the backtracking proceeds, backtracking metadata corresponding to a cell from a subsection that is not currently retained in memory can be required. Such a subsection can be regenerated from previously generated scores associated with checkpoint cells of the two-dimensional matrix that comprise two edges of the subsection being regenerated. Moreover, to further reduce memory consumption, the backtracking metadata stored for each cell can comprise four binary digits: two indicative of a directional assignment, one indicative of whether the corresponding cell is part of a deletion stretching across multiple contiguous cells, and one analogously indicative of insertions stretching across multiple contiguous cells.
-
-
-
-
-
-
-
-
-