LARGE LANGUAGE MODEL (LLM) QUANTIZATION

    公开(公告)号:US20240428006A1

    公开(公告)日:2024-12-26

    申请号:US18211967

    申请日:2023-06-20

    Applicant: GOOGLE LLC

    Abstract: Implementations relate to asymmetric quantization of large language models (LLMs). Processor(s) of a system can: obtain a trained LLM, wherein the trained LLM includes a plurality of layers, each layer comprising a respective plurality of weights; for each layer of the plurality of layers: calculate an optimal clipping range for the respective plurality of weights, and clip one or more weights of the respective plurality of weights that lie outside of the optimal clipping range to produce a clipped layer; quantize the LLM to generate a quantized LLM, wherein the instructions to quantize include instructions to map weights of the plurality of clipped layers of the LLM from continuous values to discrete values; and provide the quantized LLM for downstream processing.

Patent Agency Ranking