Data volume sculptor for deep learning acceleration

    公开(公告)号:US11610362B2

    公开(公告)日:2023-03-21

    申请号:US17194055

    申请日:2021-03-05

    Abstract: A device include on-board memory, an applications processor, a digital signal processor (DSP) cluster, a configurable accelerator framework (CAF), and at least one communication bus architecture. The communication bus communicatively couples the applications processor, the DSP cluster, and the CAF to the on-board memory. The CAF includes a reconfigurable stream switch and data volume sculpting circuitry, which has an input and an output coupled to the reconfigurable stream switch. The data volume sculpting circuitry receives a series of frames, each frame formed as a two dimensional (2D) data structure, and determines a first dimension and a second dimension of each frame of the series of frames. Based on the first and second dimensions, the data volume sculpting circuitry determines for each frame a position and a size of a region-of-interest to be extracted from the respective frame, and extracts from each frame, data in the frame that is within the region-of-interest.

    Elements for in-memory compute
    8.
    发明授权

    公开(公告)号:US11474788B2

    公开(公告)日:2022-10-18

    申请号:US16890870

    申请日:2020-06-02

    Abstract: A memory array arranged in multiple columns and rows. Computation circuits that each calculate a computation value from cell values in a corresponding column. A column multiplexer cycles through multiple data lines that each corresponds to a computation circuit. Cluster cycle management circuitry determines a number of multiplexer cycles based on a number of columns storing data of a compute cluster. A sensing circuit obtains the computation values from the computation circuits via the column multiplexer as the column multiplexer cycles through the data lines. The sensing circuit combines the obtained computation values over the determined number of multiplexer cycles. A first clock may initiate the multiplexer to cycle through its data lines for the determined number of multiplexer cycles, and a second clock may initiate each individual cycle. The multiplexer or additional circuitry may be utilized to modify the order in which data is written to the columns.

    Hardware accelerator method, system and device

    公开(公告)号:US11442700B2

    公开(公告)日:2022-09-13

    申请号:US16833340

    申请日:2020-03-27

    Abstract: A system includes an addressable memory array, one or more processing cores, and an accelerator framework coupled to the addressable memory. The accelerator framework includes a Multiply ACcumulate (MAC) hardware accelerator cluster. The MAC hardware accelerator cluster has a binary-to-residual converter, which, in operation, converts binary inputs to a residual number system. Converting a binary input to the residual number system includes a reduction modulo 2m and a reduction modulo 2m−1, where m is a positive integer. A plurality of MAC hardware accelerators perform modulo 2m multiply-and-accumulate operations and modulo 2m−1 multiply-and-accumulate operations using the converted binary input. A residual-to-binary converter generates a binary output based on the output of the MAC hardware accelerators.

Patent Agency Ranking