-
公开(公告)号:US20220383121A1
公开(公告)日:2022-12-01
申请号:US17330096
申请日:2021-05-25
Applicant: Applied Materials, Inc.
Inventor: Tameesh Suri , Bor-Chau Juang , Nathaniel See , Bilal Shafi Sheikh , Naveed Zaman , Myron Shak , Sachin Dangayach , Udaykumar Diliprao Hanmante
Abstract: A method of inducing sparsity for outputs of neural network layer may include receiving outputs from a layer of a neural network; partitioning the outputs into a plurality of partitions; identifying first partitions in the plurality of partitions that can be treated as having zero values; generating an encoding that identifies locations of the first partitions among remaining second partitions in the plurality of partitions; and sending the encoding and the second partitions to a subsequent layer in the neural network.
-
公开(公告)号:US20240403258A1
公开(公告)日:2024-12-05
申请号:US18328852
申请日:2023-06-05
Applicant: Applied Materials, Inc.
Inventor: Bilal Shafi Sheikh , Tameesh Suri , Nathaniel See , Sutapa Dutta , Yun-Ting Sun , Udaykumar Diliprao Hanmante , Naveed Zaman
Abstract: A chiplet-based architecture may quantize, or reduce, the number of bits at various stages of the data path in an artificial-intelligence processor. This architecture may leverage the synergy between quantizing multiple dimensions together to greatly decrease the memory usage and data path bandwidth. Internal weights may be quantized statically after a training procedure. Accumulator bits and activation bits may be quantized dynamically during an inference operation. New hardware logic may be configured to quantize the outputs of each operation directly from the core or other processing node before the tensor is stored in memory. Quantization may use a statistic from a previous tensor for a current output tensor, while also calculating a statistic to be used on a subsequent output tensor. In addition to quantizing based on a statistic, bits can be further quantized using a Kth percentile clamping operation.
-