Abstract:
A device includes a data processing engine array having a plurality of data processing engines organized in a grid having a plurality of rows and a plurality of columns. Each data processing engine includes a core, a memory module including a memory and a direct memory access engine. Each data processing engine includes a stream switch connected to the core, the direct memory access engine, and the stream switch of one or more adjacent data processing engines. Each memory module includes a first memory interface directly coupled to the core in the same data processing engine and one or more second memory interfaces directly coupled to the core of each of the one or more adjacent data processing engines.
Abstract:
A device may include a plurality of data processing engines. Each of the data processing engines may include a memory pool having a plurality of memory banks, a plurality of cores each coupled to the memory pool and configured to access the plurality of memory banks, a memory mapped switch coupled to the memory pool and a memory mapped switch of at least one neighboring data processing engine, and a stream switch coupled to each of the plurality of cores and to a stream switch of the at least one neighboring data processing engine.
Abstract:
Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include a memory module (with memory banks for storing data) and an interconnect which provides connectivity between the engines. To transmit processed data, a data processing engine identifies a destination processing engine in the array. Once identified, the data processing engine can transmit the processed data using a reserved point-to-point communication path in the interconnect that couples the source and destination data processing engines.
Abstract:
A device includes a data processing engine array having a plurality of data processing engines organized in a grid having a plurality of rows and a plurality of columns. Each data processing engine includes a core, a memory module including a memory and a direct memory access engine. Each data processing engine includes a stream switch connected to the core, the direct memory access engine, and the stream switch of one or more adjacent data processing engines. Each memory module includes a first memory interface directly coupled to the core in the same data processing engine and one or more second memory interfaces directly coupled to the core of each of the one or more adjacent data processing engines.
Abstract:
Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bit stream and/or binary code which configure a heterogeneous processing system of a SoC to execute the graph. The compiler uses the graph expressed in source code to determine where to assign the kernels in the heterogeneous processing system. Further, the compiler can select the specific communication techniques to establish the communication links between the kernels and whether synchronization should be used in a communication link. Thus, the programmer can express the dataflow graph at a high-level (using source code) without understanding about how the operator graph is implemented using the heterogeneous hardware in the SoC.
Abstract:
Embodiments herein use control application programming interfaces (APIs) to control the execution of a dataflow graph in a heterogeneous processing system. That is, embodiments herein describe a programming model along with associated APIs and methods that can control, interact, and at least partially reconfigure a user application (e.g., the dataflow graph) executing on the heterogeneous processing system through a local executing control program. Using the control APIs, users can manipulate such remotely executing graphs directly as local objects and perform control operations on them (e.g., for loading and initializing the graphs; dynamically adjusting parameters for adaptive control; monitoring application parameters, system states and events; scheduling operations to read and write data across the distributed memory boundary of the platform; controlling the execution life-cycle of a subsystem; and partially reconfiguring the computing resources for a new subsystem).
Abstract:
An example method of implementing an application for a system-on-chip (SOC) having a data processing engine (DPE) array includes determining a graph representation of the application, the graph representation including nodes representing kernels of the application and edges representing communication between the kernels, mapping, based on the graph, the kernels onto DPEs of the DPE array and data structures of the kernels onto memory in the DPE array, routing communication channels between DPEs and circuitry of the application configured in programmable logic of the SOC, and generating implementation data for programming the SOC to implement the application based on results of the mapping and the routing.
Abstract:
Embodiments herein use control application programming interfaces (APIs) to control the execution of a dataflow graph in a heterogeneous processing system. That is, embodiments herein describe a programming model along with associated APIs and methods that can control, interact, and at least partially reconfigure a user application (e.g., the dataflow graph) executing on the heterogeneous processing system through a local executing control program. Using the control APIs, users can manipulate such remotely executing graphs directly as local objects and perform control operations on them (e.g., for loading and initializing the graphs; dynamically adjusting parameters for adaptive control; monitoring application parameters, system states and events; scheduling operations to read and write data across the distributed memory boundary of the platform; controlling the execution life-cycle of a subsystem; and partially reconfiguring the computing resources for a new subsystem).
Abstract:
An integrated circuit (IC) includes a first region being static and providing an interface between the IC and a host processor. The first region includes a first interconnect circuit block having a first master interface and a second interconnect circuit block having a first slave interface. The IC includes a second region coupled to the first region. The second region implements a kernel of a heterogeneous, multiprocessor design and includes a slave interface coupled to the first master interface of the first interconnect circuit block and configured to receive commands from the host processor. The second region also includes a master interface coupled the first slave interface of the second interconnect circuit block, wherein the master interface of the second region is a master for a memory controller.
Abstract:
An example method of implementing an application for a system-on-chip (SOC) having a data processing engine (DPE) array includes determining a graph representation of the application, the graph representation including nodes representing kernels of the application and edges representing communication between the kernels, mapping, based on the graph, the kernels onto DPEs of the DPE array and data structures of the kernels onto memory in the DPE array, routing communication channels between DPEs and circuitry of the application configured in programmable logic of the SOC, and generating implementation data for programming the SOC to implement the application based on results of the mapping and the routing.