SYSTEM AND METHOD FOR PERFORMING COMPUTATIONS FOR DEEP NEURAL NETWORKS

    公开(公告)号:US20210326686A1

    公开(公告)日:2021-10-21

    申请号:US16900845

    申请日:2020-06-12

    Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

    Mixed-precision neural processing unit (NPU) using spatial fusion with load balancing

    公开(公告)号:US12001929B2

    公开(公告)日:2024-06-04

    申请号:US16898433

    申请日:2020-06-10

    CPC classification number: G06N20/00 H04L67/1001

    Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique. The machine learning system may include a computation circuit configured to compute a partial computation result based, at least in part, upon the selected data subdivision and the weight subdivision.

    SYSTEM AND METHOD FOR PERFORMING COMPUTATIONS FOR DEEP NEURAL NETWORKS

    公开(公告)号:US20230047273A1

    公开(公告)日:2023-02-16

    申请号:US17966488

    申请日:2022-10-14

    Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

    System and method for performing computations for deep neural networks

    公开(公告)号:US11507817B2

    公开(公告)日:2022-11-22

    申请号:US16900845

    申请日:2020-06-12

    Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

Patent Agency Ranking