DILATED CONVOLUTION USING SYSTOLIC ARRAY

    公开(公告)号:US20220292163A1

    公开(公告)日:2022-09-15

    申请号:US17832039

    申请日:2022-06-03

    Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

    Transpose operations using processing element array

    公开(公告)号:US11347480B2

    公开(公告)日:2022-05-31

    申请号:US17122136

    申请日:2020-12-15

    Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

    Compile-time scheduling
    23.
    发明授权

    公开(公告)号:US11003429B1

    公开(公告)日:2021-05-11

    申请号:US16266915

    申请日:2019-02-04

    Abstract: Scheduling of the operations of an integrated circuit device such as a hardware accelerator, including scheduling of movement of data into and out of the accelerator, can be performed by a compiler that produces program code for the accelerator. The compiler can produce a graph that represents operations to be performed by the accelerator. Using the graph, the compiler can determine estimated execution times for the operations represented by each node in the graph. The compiler can schedule operations by determining an estimated execution time for set of dependent operations that depend from an operation. The compiler can then select an operation that has a shortest estimated execution time from among a set of operations and which has a set of dependent operations that has a longest estimated execution time as compared to other sets of dependent operations.

    TRANSPOSE OPERATIONS USING PROCESSING ELEMENT ARRAY

    公开(公告)号:US20210096823A1

    公开(公告)日:2021-04-01

    申请号:US17122136

    申请日:2020-12-15

    Abstract: Provided are integrated circuits and methods for transposing a tensor using processing element array operations. In some cases, it may be necessary to transpose elements of a tensor to perform a matrix operation. The tensor may be decomposed into blocks of data elements having dimensions consistent with the dimensions of a systolic array. An identity multiplication may be performed on each block of data elements loaded into a systolic array and the multiplication products summed in column partitions of a results buffer. The data elements in the column partitions of results buffer can then be mapped to row partitions of a buffer memory for further processing.

    Registers for restricted memory
    25.
    发明授权

    公开(公告)号:US10678479B1

    公开(公告)日:2020-06-09

    申请号:US16204943

    申请日:2018-11-29

    Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank, and can read from and write to only the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

Patent Agency Ranking