-
1.
公开(公告)号:US20250045586A1
公开(公告)日:2025-02-06
申请号:US18717275
申请日:2022-03-04
Applicant: Intel Corporation
Inventor: Hengyu MENG , Jiong GONG , Xudong LIU , Haihao SHEN
IPC: G06N3/082
Abstract: The application provides a method and apparatus for accelerating deep learning inference based on a HW-aware sparsity pattern. The method may include determining a hardware-aware sparsity pattern based on a register width specified by an ISA of a hardware unit for implementing the DNN for deep learning inference, the sparsity pattern specifying a block size and a sparsity ratio for block-wise sparsification of a weight matrix of an operator in the DNN; performing the block-wise sparsification for the weight matrix based on the sparsity pattern to obtain a sparse weight matrix, during a training process of the DNN; compressing the sparse weight matrix into a concentrated weight matrix by removing all-zero blocks from the sparse weight matrix; and generating a mask to indicate an index of each row of non-zero blocks in the sparse weight matrix to enable extraction of corresponding elements from an activation matrix of the operator.