Processor with outlier accommodation

    公开(公告)号:US12229659B2

    公开(公告)日:2025-02-18

    申请号:US17110266

    申请日:2020-12-02

    Abstract: A system and method for performing sets of multiplications in a manner that accommodates outlier values. In some embodiments the method includes: forming a first set of products, each product of the first set of products being a product of a first activation value and a respective weight of a first plurality of weights. The forming of the first set of products may include multiplying, in a first multiplier, the first activation value and a least significant sub-word of a first weight to form a first partial product; multiplying, in a second multiplier, the first activation value and a least significant sub-word of a second weight; multiplying, in a third multiplier, the first activation value and a most significant sub-word of the first weight to form a second partial product; and adding the first partial product and the second partial product.

    Runtime reconfigurable compression format conversion

    公开(公告)号:US12224774B2

    公开(公告)日:2025-02-11

    申请号:US18096551

    申请日:2023-01-12

    Abstract: A runtime data-format optimizer for a processing element includes a sparsity-detector and a compression-converter. The sparsity-detector selects a first compression-conversion format during a runtime of the processing element based on a performance model that is based on a first sparsity pattern of first data stored in a first memory that is exterior to the processing element and a second sparsity pattern of second data that is to be stored in a second memory within the processing element. The second sparsity pattern is based on a runtime configuration of the processing element. The first data is stored in the first memory using a first compression format and the second data is to be stored in the second memory using a second compression format. The compression-conversion circuit converts the first compression format of the first data to be the second compression format of the second data based on the first compression-conversion format.

    PROCESSOR WITH OUTLIER ACCOMMODATION

    公开(公告)号:US20220114425A1

    公开(公告)日:2022-04-14

    申请号:US17110266

    申请日:2020-12-02

    Abstract: A system and method for performing sets of multiplications in a manner that accommodates outlier values. In some embodiments the method includes: forming a first set of products, each product of the first set of products being a product of a first activation value and a respective weight of a first plurality of weights. The forming of the first set of products may include multiplying, in a first multiplier, the first activation value and a least significant sub-word of a first weight to form a first partial product; multiplying, in a second multiplier, the first activation value and a least significant sub-word of a second weight; multiplying, in a third multiplier, the first activation value and a most significant sub-word of the first weight to form a second partial product; and adding the first partial product and the second partial product.

    Mixed-precision neural processing unit (NPU) using spatial fusion with load balancing

    公开(公告)号:US12001929B2

    公开(公告)日:2024-06-04

    申请号:US16898433

    申请日:2020-06-10

    CPC classification number: G06N20/00 H04L67/1001

    Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique. The machine learning system may include a computation circuit configured to compute a partial computation result based, at least in part, upon the selected data subdivision and the weight subdivision.

    SYSTEM AND METHOD FOR PERFORMING COMPUTATIONS FOR DEEP NEURAL NETWORKS

    公开(公告)号:US20230047273A1

    公开(公告)日:2023-02-16

    申请号:US17966488

    申请日:2022-10-14

    Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

    Accelerating 2D convolutional layer mapping on a dot product architecture

    公开(公告)号:US12112141B2

    公开(公告)日:2024-10-08

    申请号:US16900819

    申请日:2020-06-12

    CPC classification number: G06F7/5443 G06F9/30105 G06N3/063

    Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.

    System and method for performing computations for deep neural networks

    公开(公告)号:US11507817B2

    公开(公告)日:2022-11-22

    申请号:US16900845

    申请日:2020-06-12

    Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

Patent Agency Ranking