NEURAL NETWORK ACCELERATOR PERFORMING OPERATION WITH MIXED-FORMAT WEIGHTS

    公开(公告)号:US20250060940A1

    公开(公告)日:2025-02-20

    申请号:US18931973

    申请日:2024-10-30

    Abstract: A data processing unit may include a memory, processing elements (PEs), and a control unit. The memory may store weight blocks within a weight tensor of a neural network operation. Each weight block has an input channel (IC) dimension and an output channel (OC) dimension and includes subblocks. A subblock includes one or more weights having a first data precision and one or more other weights having a second data precision. The second data precision is lower than the first data precision. The control unit may distribute different ones of the subblocks to different ones of the PEs. A PE may receive a subblock and perform a first MAC operation on a weight having a first data precision and a second MAC operation on a weight having a second data precision. The first MAC operation may consume more computation cycles or more multipliers than the second MAC operation.

    OUTPUT DRAIN PATH FACILITATING FLEXIBLE SCHEDULE-BASED DEEP NEURAL NETWORK ACCELERATOR

    公开(公告)号:US20240013040A1

    公开(公告)日:2024-01-11

    申请号:US18474464

    申请日:2023-09-26

    CPC classification number: G06N3/063 G06N3/048 G06N3/0464

    Abstract: A drain module may drain activations in an output tensor of a convolution from a PE array that performs the convolution. The drain module may extract activations generated in a collection of PE columns. The activations generated in the PE columns in the collection may be concatenated, e.g., activations generated in the first PE column of the collection may be followed by activations generated in the second PE column of the collection, and so on. The activations in the output tensor may be rearranged into activation vectors. Each activation vector may include activations in different output channels of the deep learning operation. The activations in each activation vector may have the same (X, Y) coordinate in the output tensor. The drain module may determine a memory address for an activation based on the activation's (X, Y, Z) coordinate in the output tensor and write the activation to the memory address.

    SCHEDULING COMPUTATIONS IN DEEP NEURAL NETWORK BASED ON SPARSITY

    公开(公告)号:US20230229507A1

    公开(公告)日:2023-07-20

    申请号:US18180415

    申请日:2023-03-08

    CPC classification number: G06F9/5027 G06N3/04 H04L41/16

    Abstract: Computations in processing elements (PEs) for executing a deep neural network are scheduled via a computation scheduler based on sparsity in input data of the computations to reduce voltage droops. Each PE may compute an input operand and a weight operand in a computation. The computation scheduler may predict the workload of the PE for the computation based on a combined sparsity bitmap, which may be generated based on a sparsity bitmap of the input operand and a sparsity bitmap of the weight operand. The computation scheduler can schedule the starts of the computations in the PEs based on the predicted workloads of the PEs. The computation scheduler may instruct the PE having the highest workload to start the computation first and instruct the other PEs to start computations later. In some embodiments, the computations in the PEs may end in the same clock cycle.

    DATA REUSE IN DEEP LEARNING
    6.
    发明申请

    公开(公告)号:US20220188638A1

    公开(公告)日:2022-06-16

    申请号:US17684764

    申请日:2022-03-02

    Abstract: An apparatus for convolution operations is provided. The apparatus includes a PE array, a datastore, writing modules, reading modules, and a controlling module. The PE array performs MAC operations. The datastore includes databanks, each of which stores data to be used by a column of the PE array. The writing modules transfer data from a memory to the datastore. The reading modules transfer data from the datastore to the PE array. Each reading module may transfer data to a particular column of the PE array. The controlling module can determine the rounds of a convolution operation. Each round includes MAC operations based on a weight. The controlling module controls the writing modules and reading modules so that the same data in a databank can be reused in multiple rounds. For different rounds, the controlling module can provide a reading module accesses to different databanks.

Patent Agency Ranking