Abstract:
Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.
Abstract:
A device may include a plurality of data processing engines. Each data processing engine may include a core and a memory module. Each core may be configured to access the memory module in the same data processing engine and a memory module within at least one other data processing engine of the plurality of data processing engines.
Abstract:
A device may include a plurality of data processing engines. Each data processing engine may include a core and a memory module. Each core may be configured to access the memory module in the same data processing engine and a memory module within at least one other data processing engine of the plurality of data processing engines.
Abstract:
An integrated circuit (IC) includes a first region being static and providing an interface between the IC and a host processor. The first region includes a first interconnect circuit block having a first master interface and a second interconnect circuit block having a first slave interface. The IC includes a second region coupled to the first region. The second region implements a kernel of a heterogeneous, multiprocessor design and includes a slave interface coupled to the first master interface of the first interconnect circuit block and configured to receive commands from the host processor. The second region also includes a master interface coupled the first slave interface of the second interconnect circuit block, wherein the master interface of the second region is a master for a memory controller.
Abstract:
Embodiments herein use control application programming interfaces (APIs) to control the execution of a dataflow graph in a heterogeneous processing system. That is, embodiments herein describe a programming model along with associated APIs and methods that can control, interact, and at least partially reconfigure a user application (e.g., the dataflow graph) executing on the heterogeneous processing system through a local executing control program. Using the control APIs, users can manipulate such remotely executing graphs directly as local objects and perform control operations on them (e.g., for loading and initializing the graphs; dynamically adjusting parameters for adaptive control; monitoring application parameters, system states and events; scheduling operations to read and write data across the distributed memory boundary of the platform; controlling the execution life-cycle of a subsystem; and partially reconfiguring the computing resources for a new subsystem).
Abstract:
Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bit stream and/or binary code which configure a heterogeneous processing system of a SoC to execute the graph. The compiler uses the graph expressed in source code to determine where to assign the kernels in the heterogeneous processing system. Further, the compiler can select the specific communication techniques to establish the communication links between the kernels and whether synchronization should be used in a communication link. Thus, the programmer can express the dataflow graph at a high-level (using source code) without understanding about how the operator graph is implemented using the heterogeneous hardware in the SoC.
Abstract:
A device may include a plurality of data processing engines. Each of the data processing engines may include a core and a memory module. The plurality of data processing engines may be organized in a plurality of rows. Each core may be configured to communicate with other neighboring data processing engines of the plurality of data processing engines by shared access to the memory modules of the neighboring data processing engines.
Abstract:
OpenCL program compilation may include generating, using a processor, a register transfer level (RTL) description of a first kernel of a heterogeneous, multiprocessor design and integrating the RTL description of the first kernel with a base platform circuit design. The base platform circuit design provides a static interface within a programmable integrated circuit to a host of the heterogeneous, multiprocessor design. A first configuration bitstream may be generated from the RTL description of the first kernel using the processor. The first configuration bitstream specifies a hardware implementation of the first kernel and supporting data for the configuration bitstream. The first configuration bitstream and the supporting data may be included within a binary container.