Systems and methods for compression and distribution of machine learning models
Abstract:
The present disclosure provides systems and methods for compressing and/or distributing machine learning models. In one example, a computer-implemented method is provided to compress machine-learned models, which includes obtaining, by one or more computing devices, a machine-learned model. The method includes selecting, by the one or more computing devices, a weight to be quantized and quantizing, by the one or more computing devices, the weight. The method includes propagating, by the one or more computing devices, at least a part of a quantization error to one or more non-quantized weights and quantizing, by the one or more computing devices, one or more of the non-quantized weights. The method includes providing, by the one or more computing devices, a quantized machine-learned model.
Information query
Patent Agency Ranking
0/0