-
公开(公告)号:US20230376292A1
公开(公告)日:2023-11-23
申请号:US18136321
申请日:2023-04-18
发明人: David Alan KOEPLINGER , Weiwei CHEN , Kevin BROWN , Xiaoming GU
IPC分类号: G06F8/41 , G06F12/0842
CPC分类号: G06F8/443 , G06F12/0842 , G06F8/433
摘要: The technology disclosed relates to automatically assigning and optimizing the physical memory layouts of all intermediate dense tensor data in a program. The technology disclosed is an implementation of a compiler analysis and transformation pass which automatically determines required physical layouts in light of kernel operation and performance requirements. The proposed solution also inserts physical layout conversion operations where necessary in cases of unresolvable incompatibilities. The pass takes as input a program acyclic dataflow graph and a set of physical layout constraints for every known operation.
-
公开(公告)号:US20230325312A1
公开(公告)日:2023-10-12
申请号:US17974910
申请日:2022-10-27
发明人: David Alan KOEPLINGER , Adam BORDELON , Weihang FAN , Kevin BROWN , Weiwei CHEN
IPC分类号: G06F12/0802
CPC分类号: G06F12/0802 , G06F2212/1041
摘要: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.
-
公开(公告)号:US20230325163A1
公开(公告)日:2023-10-12
申请号:US18206829
申请日:2023-06-07
发明人: Weiwei CHEN , Raghu PRABHAKAR , David Alan KOEPLINGER , Sitanshu GUPTA , Ruddhi CHAPHEKAR , Ajit PUNJ , Sumti JAIRATH
CPC分类号: G06F8/452 , G06F15/7867 , G06F8/41 , G06F15/825
摘要: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.
-
公开(公告)号:US20230205501A1
公开(公告)日:2023-06-29
申请号:US18089157
申请日:2022-12-27
IPC分类号: G06F8/41
摘要: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
-
5.
公开(公告)号:US20220092247A1
公开(公告)日:2022-03-24
申请号:US17031679
申请日:2020-09-24
IPC分类号: G06F30/392 , G06F30/33 , G06F30/337 , G06F8/41
摘要: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.
-
公开(公告)号:US20210373867A1
公开(公告)日:2021-12-02
申请号:US16890841
申请日:2020-06-02
发明人: Weiwei CHEN , Raghu PRABHAKAR , David Alan KOEPLINGER , Sitanshu GUPTA , Ruddhi Arun CHAPHEKAR , Ajit PUNJ , Sumti JAIRATH
摘要: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
-
公开(公告)号:US20230315410A1
公开(公告)日:2023-10-05
申请号:US18129714
申请日:2023-03-31
发明人: Bowen YANG , Zhuo CHEN , Chen LIU , Fei WANG , Ruobing WANG , Qinghua Li , Weiwei CHEN , Junjue WANG , Sumti JAIRATH
IPC分类号: G06F8/41
CPC分类号: G06F8/443
摘要: A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.
-
公开(公告)号:US20220147328A1
公开(公告)日:2022-05-12
申请号:US17582421
申请日:2022-01-24
摘要: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.
-
公开(公告)号:US20230305823A1
公开(公告)日:2023-09-28
申请号:US18126610
申请日:2023-03-27
发明人: Fei WANG , David Alan KOEPLINGER , Kevin BROWN , Weiwei CHEN
IPC分类号: G06F8/41
CPC分类号: G06F8/45 , G06F8/4434
摘要: A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.
-
-
-
-
-
-
-
-