CHIPLET AWARE ADAPTABLE QUANTIZATION

    公开(公告)号:US20240403258A1

    公开(公告)日:2024-12-05

    申请号:US18328852

    申请日:2023-06-05

    Abstract: A chiplet-based architecture may quantize, or reduce, the number of bits at various stages of the data path in an artificial-intelligence processor. This architecture may leverage the synergy between quantizing multiple dimensions together to greatly decrease the memory usage and data path bandwidth. Internal weights may be quantized statically after a training procedure. Accumulator bits and activation bits may be quantized dynamically during an inference operation. New hardware logic may be configured to quantize the outputs of each operation directly from the core or other processing node before the tensor is stored in memory. Quantization may use a statistic from a previous tensor for a current output tensor, while also calculating a statistic to be used on a subsequent output tensor. In addition to quantizing based on a statistic, bits can be further quantized using a Kth percentile clamping operation.

Patent Agency Ranking