METHOD AND APPARATUS FOR ACCELERATING DEEP LEANING INFERENCE BASED ON HW-AWARE SPARSITY PATTERN

    公开(公告)号:US20250045586A1

    公开(公告)日:2025-02-06

    申请号:US18717275

    申请日:2022-03-04

    Abstract: The application provides a method and apparatus for accelerating deep learning inference based on a HW-aware sparsity pattern. The method may include determining a hardware-aware sparsity pattern based on a register width specified by an ISA of a hardware unit for implementing the DNN for deep learning inference, the sparsity pattern specifying a block size and a sparsity ratio for block-wise sparsification of a weight matrix of an operator in the DNN; performing the block-wise sparsification for the weight matrix based on the sparsity pattern to obtain a sparse weight matrix, during a training process of the DNN; compressing the sparse weight matrix into a concentrated weight matrix by removing all-zero blocks from the sparse weight matrix; and generating a mask to indicate an index of each row of non-zero blocks in the sparse weight matrix to enable extraction of corresponding elements from an activation matrix of the operator.

Patent Agency Ranking