DATA PARALLEL PROGRAMMING-BASED TRANSPARENT TRANSFER ACROSS HETEROGENEOUS DEVICES

    公开(公告)号:US20220197715A1

    公开(公告)日:2022-06-23

    申请号:US17693010

    申请日:2022-03-11

    Abstract: An apparatus to facilitate data parallel programming-based transparent transfer across heterogeneous devices is disclosed. The apparatus includes a processor to: identify a change in device status that triggers a device transfer process from an original device, wherein the original device is associated with a queue of an application program of a data parallel programming runtime; identify a new device that is compatible with the original device; migrate at least one of a state or data of the original device to the new device; logically map, without user intervention, the queue to the new device in the data parallel programming runtime; and initiate execution of the application program on the new device using the queue.

    DATA PARALLEL PROGRAMMING TASK GRAPH OPTIMIZATION THROUGH DEVICE TELEMETRY

    公开(公告)号:US20220197615A1

    公开(公告)日:2022-06-23

    申请号:US17692425

    申请日:2022-03-11

    Abstract: An apparatus to facilitate data parallel programming task graph optimization through device telemetry is disclosed. The apparatus includes a processor to: receive, from a compiler, compiled code generated from source code of an application, the compiled code to support a workload of the application; generate a task graph of the application using the compiled code, the task graph to represent at least one of a relationship or dependency of the compiled code; receive runtime telemetry data corresponding to execution of the compiled code on the one or more accelerator devices; identify one or more scheduling optimizations for the one or more accelerator devices based on the task graph and the received telemetry data; and provide a scheduling command to cause the one or more scheduling optimizations to be implemented in the one or more accelerator devices.

    INCREMENTAL JUST-IN-TIME (JIT) PERFORMANCE REFINEMENT FOR PROGRAMMABLE LOGIC DEVICE OFFLOAD

    公开(公告)号:US20220197610A1

    公开(公告)日:2022-06-23

    申请号:US17692413

    申请日:2022-03-11

    Abstract: An apparatus to facilitate incremental just-in-time (JIT) performance refinement for programmable logic device offload is disclosed. The apparatus includes a processor to: initiate multiple just-in-time (JIT) compilation iterations of an application; program a first architecture of a first compilation of the multiple JIT compilation iterations to a programmable logic device and execute the application on the first architecture, wherein the first compilation comprises a faster compilation time amongst the multiple JIT compilation iterations; identify a hotspot; determine that a second compilation of the multiple JIT compilation iterations is complete, wherein the second compilation comprises a slower compilation time than the first compilation; and program a second architecture of the second compilation of the multiple JIT compilation iterations to the programmable logic device and execute the application on the second architecture.

    Fast CAD Compilation Through Coarse Macro Lowering

    公开(公告)号:US20240020449A1

    公开(公告)日:2024-01-18

    申请号:US18475512

    申请日:2023-09-27

    CPC classification number: G06F30/347

    Abstract: Systems or methods of the present disclosure may provide a library including multiple macros that may be pre-compiled prior to implementation of the design. For example, a design may be mapped to one or more macros in the library, and the one or more macros may be placed into and routed between a portion of a region, one region, one or more regions of the integrated circuit device to implement the design. Since the macros may be pre-compiled, compilation time experienced by the designer may correspond to the placement and routing of the one or more macros, which may be less than compilation time for fine-grained operations. The pre-compiled logic within the macros may be set using a lookup table mask to set and/or adjust a functionality of the macro. Additionally or alternatively, the place and route operation may be performed at finer granularities to reduce bottle necks.

    CLOCK GATING AND CLOCK SCALING BASED ON RUNTIME APPLICATION TASK GRAPH INFORMATION

    公开(公告)号:US20220197613A1

    公开(公告)日:2022-06-23

    申请号:US17692405

    申请日:2022-03-11

    Abstract: An apparatus to facilitate clock gating and clock scaling based on runtime application task graph information is disclosed. The apparatus includes a processor to: receive, from a compiler, a bitstream generated from code of an application, the bitstream related to a workload of the application; generate a task graph of the application using at least part of the bitstream, the task graph to represent one of a relationship and dependency of the code; program the bitstream to an accelerator device, wherein the bitstream to configure the accelerator device to support the workload of the application; execute one or more kernels of the code using the accelerator device; identify one or more optimizations for the accelerator device based on the task graph of the application; and transmit a command to cause the one or more optimizations to be implemented in the at least one region of the accelerator device.

Patent Agency Ranking