DYNAMIC PRUNING OF NEURONS ON-THE-FLY TO ACCELERATE NEURAL NETWORK INFERENCES

    公开(公告)号:US20250005364A1

    公开(公告)日:2025-01-02

    申请号:US18761714

    申请日:2024-07-02

    Abstract: Systems, apparatuses and methods may provide for technology that aggregates contextual information from a first network layer in a neural network having a second network layer coupled to an output of the first network layer, wherein the context information is to be aggregated in real-time and after a training of the neural network, and wherein the context information is to include channel values. Additionally, the technology may conduct an importance classification of the aggregated context information and selectively exclude one or more channels in the first network layer from consideration by the second network layer based on the importance classification.

    WEIGHT QUANTIZATION ADAPTATION TECHNOLOGY

    公开(公告)号:US20250028965A1

    公开(公告)日:2025-01-23

    申请号:US18904364

    申请日:2024-10-02

    Abstract: Systems, apparatuses and methods may provide for technology that selects a subset of linear layers from a plurality of linear layers in a pre-trained artificial intelligence (AI) model, wherein a quantization error of the subset of linear layers exceeds an error threshold. For each linear layer in the subset of linear layers, the technology solves a singular value decomposition (SVD) approximation, generates a first adapter layer and a second adapter layer based on the SVD decomposition, wherein the first adapter layer and the second adapter layer include weight matrices having a first dimension that is less than a first rank threshold and a second dimension that is greater than a second rank threshold, and determines an inference output based on the linear layer, the first adapter layer and the second adapter layer.

    DYNAMIC PRUNING OF NEURONS ON-THE-FLY TO ACCELERATE NEURAL NETWORK INFERENCES

    公开(公告)号:US20210027166A1

    公开(公告)日:2021-01-28

    申请号:US16958080

    申请日:2018-04-09

    Abstract: Systems, apparatuses and methods may provide for technology that aggregates contextual information from a first network layer in a neural network having a second network layer coupled to an output of the first network layer, wherein the context information is to be aggregated in real-time and after a training of the neural network, and wherein the context information is to include channel values. Additionally, the technology may conduct an importance classification of the aggregated context information and selectively exclude one or more channels in the first network layer from consideration by the second network layer based on the importance classification.

Patent Agency Ranking