Merging Buffer Access Operations in a Coarse-grained Reconfigurable Computing System

    公开(公告)号:US20230325312A1

    公开(公告)日:2023-10-12

    申请号:US17974910

    申请日:2022-10-27

    IPC分类号: G06F12/0802

    CPC分类号: G06F12/0802 G06F2212/1041

    摘要: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.

    COMPILER-BASED INPUT SYNCHRONIZATION FOR PROCESSOR WITH VARIANT STAGE LATENCIES

    公开(公告)号:US20230205501A1

    公开(公告)日:2023-06-29

    申请号:US18089157

    申请日:2022-12-27

    IPC分类号: G06F8/41

    摘要: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.

    Compile Time Logic for Detecting Streaming Compatible and Broadcast Compatible Data Access Patterns

    公开(公告)号:US20220092247A1

    公开(公告)日:2022-03-24

    申请号:US17031679

    申请日:2020-09-24

    摘要: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.

    Anti-Congestion Flow Control for Reconfigurable Processors

    公开(公告)号:US20210373867A1

    公开(公告)日:2021-12-02

    申请号:US16890841

    申请日:2020-06-02

    IPC分类号: G06F8/41 G06F15/78

    摘要: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

    OPTIMIZING TENSOR TILING IN NEURAL NETWORKS BASED ON A TILING COST MODEL

    公开(公告)号:US20230315410A1

    公开(公告)日:2023-10-05

    申请号:US18129714

    申请日:2023-03-31

    IPC分类号: G06F8/41

    CPC分类号: G06F8/443

    摘要: A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.

    COMPILE TIME LOGIC FOR INSERTING A BUFFER BETWEEN A PRODUCER OPERATION UNIT AND A CONSUMER OPERATION UNIT IN A DATAFLOW GRAPH

    公开(公告)号:US20220147328A1

    公开(公告)日:2022-05-12

    申请号:US17582421

    申请日:2022-01-24

    IPC分类号: G06F8/41 G06F15/78 G06F16/90

    摘要: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.

    Merging Skip-Buffers
    9.
    发明公开

    公开(公告)号:US20230305823A1

    公开(公告)日:2023-09-28

    申请号:US18126610

    申请日:2023-03-27

    IPC分类号: G06F8/41

    CPC分类号: G06F8/45 G06F8/4434

    摘要: A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.