-
公开(公告)号:US20220197715A1
公开(公告)日:2022-06-23
申请号:US17693010
申请日:2022-03-11
Applicant: Intel Corporation
Inventor: Ben J. Ashbaugh , Michael Kinsner , James Brodman , Rajesh Poornachandran
IPC: G06F9/50
Abstract: An apparatus to facilitate data parallel programming-based transparent transfer across heterogeneous devices is disclosed. The apparatus includes a processor to: identify a change in device status that triggers a device transfer process from an original device, wherein the original device is associated with a queue of an application program of a data parallel programming runtime; identify a new device that is compatible with the original device; migrate at least one of a state or data of the original device to the new device; logically map, without user intervention, the queue to the new device in the data parallel programming runtime; and initiate execution of the application program on the new device using the queue.
-
公开(公告)号:US20220197615A1
公开(公告)日:2022-06-23
申请号:US17692425
申请日:2022-03-11
Applicant: Intel Corporation
Inventor: Michael Kinsner , Ben J. Ashbaugh , James Brodman , Rajesh Poornachandran
Abstract: An apparatus to facilitate data parallel programming task graph optimization through device telemetry is disclosed. The apparatus includes a processor to: receive, from a compiler, compiled code generated from source code of an application, the compiled code to support a workload of the application; generate a task graph of the application using the compiled code, the task graph to represent at least one of a relationship or dependency of the compiled code; receive runtime telemetry data corresponding to execution of the compiled code on the one or more accelerator devices; identify one or more scheduling optimizations for the one or more accelerator devices based on the task graph and the received telemetry data; and provide a scheduling command to cause the one or more scheduling optimizations to be implemented in the one or more accelerator devices.
-
3.
公开(公告)号:US20220197610A1
公开(公告)日:2022-06-23
申请号:US17692413
申请日:2022-03-11
Applicant: Intel Corporation
Inventor: Michael Kinsner , John Freeman , Ben J. Ashbaugh , Rajesh Poornachandran
Abstract: An apparatus to facilitate incremental just-in-time (JIT) performance refinement for programmable logic device offload is disclosed. The apparatus includes a processor to: initiate multiple just-in-time (JIT) compilation iterations of an application; program a first architecture of a first compilation of the multiple JIT compilation iterations to a programmable logic device and execute the application on the first architecture, wherein the first compilation comprises a faster compilation time amongst the multiple JIT compilation iterations; identify a hotspot; determine that a second compilation of the multiple JIT compilation iterations is complete, wherein the second compilation comprises a slower compilation time than the first compilation; and program a second architecture of the second compilation of the multiple JIT compilation iterations to the programmable logic device and execute the application on the second architecture.
-
公开(公告)号:US20240020449A1
公开(公告)日:2024-01-18
申请号:US18475512
申请日:2023-09-27
Applicant: Intel Corporation
Inventor: Byron Sinclair , Deshanand P. Singh , Gregg William Baeckler , Mahesh A. Iyer , Michael Kinsner , Chengping Liang , Victor Tzi-on Zhang
IPC: G06F30/347
CPC classification number: G06F30/347
Abstract: Systems or methods of the present disclosure may provide a library including multiple macros that may be pre-compiled prior to implementation of the design. For example, a design may be mapped to one or more macros in the library, and the one or more macros may be placed into and routed between a portion of a region, one region, one or more regions of the integrated circuit device to implement the design. Since the macros may be pre-compiled, compilation time experienced by the designer may correspond to the placement and routing of the one or more macros, which may be less than compilation time for fine-grained operations. The pre-compiled logic within the macros may be set using a lookup table mask to set and/or adjust a functionality of the macro. Additionally or alternatively, the place and route operation may be performed at finer granularities to reduce bottle necks.
-
公开(公告)号:US20230237231A1
公开(公告)日:2023-07-27
申请号:US18191785
申请日:2023-03-28
Applicant: Intel Corporation
Inventor: Byron Sinclair , Michael Kinsner , Gabriel Quan , Victor Tzi-on Zhang , Mahesh A. Iyer , Chengping Liang , Deshanand P. Singh
IPC: G06F30/347 , G06F30/31
CPC classification number: G06F30/347 , G06F30/31
Abstract: Systems or methods of the present disclosure may provide an electronic device that includes memory storing instructions; and a processor, that when executing the instructions, is to receive a design for a programmable fabric of an integrated circuit device. The instructions are also to cause the processor to cause compilation of the design into a configuration during a compilation window. The instructions further are to cause the processor to determine at least some routing for the configuration outside of the compilation window.
-
6.
公开(公告)号:US20230237230A1
公开(公告)日:2023-07-27
申请号:US18191789
申请日:2023-03-28
Applicant: Intel Corporation
Inventor: Michael Kinsner , Byron Sinclair , Deshanand P. Singh , Scott Jeremy Weber , Anandh Venkateswaran , Mahesh A. Iyer
IPC: G06F30/343 , G06F30/347
CPC classification number: G06F30/343 , G06F30/347 , G06F2119/12
Abstract: Systems or methods of the present disclosure may provide a library including multiple personas that may be pre-generated by a manufacturer and/or custom generated by a designer that may be used to implement a design onto an integrated circuit device. The design may be decomposed into one or more personas to be implemented as coarse-grained operations on the integrated circuit device, thereby decreasing compilation time experienced by the designer. The personas may be loaded into one or more regions of the integrated circuit device to realize the design. That is, the design may be realized by one persona may be implemented across multiple regions, one region may be configured by multiple personas, one persona configuring one region, or any combination thereof. Additionally or alternatively, the integrated circuit device may include networks-on-chip to improve data routing between the regions.
-
公开(公告)号:US20220197613A1
公开(公告)日:2022-06-23
申请号:US17692405
申请日:2022-03-11
Applicant: Intel Corporation
Inventor: Michael Kinsner , Rajesh Poornachandran , John Freeman
IPC: G06F8/41
Abstract: An apparatus to facilitate clock gating and clock scaling based on runtime application task graph information is disclosed. The apparatus includes a processor to: receive, from a compiler, a bitstream generated from code of an application, the bitstream related to a workload of the application; generate a task graph of the application using at least part of the bitstream, the task graph to represent one of a relationship and dependency of the code; program the bitstream to an accelerator device, wherein the bitstream to configure the accelerator device to support the workload of the application; execute one or more kernels of the code using the accelerator device; identify one or more optimizations for the accelerator device based on the task graph of the application; and transmit a command to cause the one or more optimizations to be implemented in the at least one region of the accelerator device.
-
-
-
-
-
-