DYNAMIC PRUNING OF NEURONS ON-THE-FLY TO ACCELERATE NEURAL NETWORK INFERENCES

    公开(公告)号:US20250005364A1

    公开(公告)日:2025-01-02

    申请号:US18761714

    申请日:2024-07-02

    Abstract: Systems, apparatuses and methods may provide for technology that aggregates contextual information from a first network layer in a neural network having a second network layer coupled to an output of the first network layer, wherein the context information is to be aggregated in real-time and after a training of the neural network, and wherein the context information is to include channel values. Additionally, the technology may conduct an importance classification of the aggregated context information and selectively exclude one or more channels in the first network layer from consideration by the second network layer based on the importance classification.

    Real-time automatic vehicle camera calibration

    公开(公告)号:US10694175B2

    公开(公告)日:2020-06-23

    申请号:US15769124

    申请日:2016-11-11

    Abstract: A camera facing the front of a vehicle while the vehicle is moving on the road may be calibrated by receiving sequential images from the camera. Image key points in the area limited by the road location are selected. The key points are tracked using an optical flow method. A filtering procedure is applied to the key points to identify the straight-line motion of the vehicle. At least two straight lines corresponding to opposite sides of the road. A calibration algorithm is applied to the at least two lines to determine a vanishing point. The pitch and/or yaw angles of the camera are then calculated.

    METHODS AND APPARATUS TO EVICT TOKENS FROM A KEY VALUE CACHE

    公开(公告)号:US20250036876A1

    公开(公告)日:2025-01-30

    申请号:US18913538

    申请日:2024-10-11

    Abstract: Systems, apparatus, articles of manufacture, and methods are disclosed to evict tokens from a key value cache. An example apparatus includes interface circuitry, machine readable instructions, and programmable circuitry to at least one of instantiate or execute the machine readable instructions to: determine score history values for tokens based on attention scores associated with the tokens, wherein a token is a numerical representation of text, after a number of tokens present in the key value cache exceeds a threshold number of tokens, compute group importance scores for groups of tokens based on score history values of the tokens in the groups of tokens, identify low-ranked groups of tokens having lowest group importance scores, the low-ranked groups of tokens associated with an eviction range in the key value cache, and remove an identified low-ranked group of tokens from the eviction range of the key value cache.

    WEIGHT QUANTIZATION ADAPTATION TECHNOLOGY

    公开(公告)号:US20250028965A1

    公开(公告)日:2025-01-23

    申请号:US18904364

    申请日:2024-10-02

    Abstract: Systems, apparatuses and methods may provide for technology that selects a subset of linear layers from a plurality of linear layers in a pre-trained artificial intelligence (AI) model, wherein a quantization error of the subset of linear layers exceeds an error threshold. For each linear layer in the subset of linear layers, the technology solves a singular value decomposition (SVD) approximation, generates a first adapter layer and a second adapter layer based on the SVD decomposition, wherein the first adapter layer and the second adapter layer include weight matrices having a first dimension that is less than a first rank threshold and a second dimension that is greater than a second rank threshold, and determines an inference output based on the linear layer, the first adapter layer and the second adapter layer.

    DYNAMIC PRUNING OF NEURONS ON-THE-FLY TO ACCELERATE NEURAL NETWORK INFERENCES

    公开(公告)号:US20210027166A1

    公开(公告)日:2021-01-28

    申请号:US16958080

    申请日:2018-04-09

    Abstract: Systems, apparatuses and methods may provide for technology that aggregates contextual information from a first network layer in a neural network having a second network layer coupled to an output of the first network layer, wherein the context information is to be aggregated in real-time and after a training of the neural network, and wherein the context information is to include channel values. Additionally, the technology may conduct an importance classification of the aggregated context information and selectively exclude one or more channels in the first network layer from consideration by the second network layer based on the importance classification.

Patent Agency Ranking