-
公开(公告)号:US20190043159A1
公开(公告)日:2019-02-07
申请号:US16144290
申请日:2018-09-27
摘要: Embodiments are disclosed for emulation of graphics processing unit instructions. An example apparatus includes a kernel accessor to access an instruction of an original GPU kernel, the original GPU kernel intended to be executed at a first GPU. An instruction support determiner is to determine whether execution of the instruction is supported by a second GPU different from the first GPU. An instruction modifier is to, in response to determining that the execution of the instruction is not supported by the second GPU, create an instrumented GPU kernel based on the original GPU kernel. The instrumented GPU kernel includes an emulation sequence. The emulation sequence is to, when executed by the second GPU, cause the second GPU to emulate execution of the instruction by the first GPU.
-
公开(公告)号:US20190139183A1
公开(公告)日:2019-05-09
申请号:US16235304
申请日:2018-12-28
摘要: Techniques and apparatus for profiling graphics processing unit (GPU) processes using binary instrumentation are described. In one embodiment, for example, an apparatus may include at least one memory comprising instructions and a processor coupled to the at least one memory. The processor may execute the instructions to determine a plurality of profiling modes for profiling an operating process of a graphics processing unit (GPU) application, access original binary code for the GPU application, and generate a multi-mode instrumented binary code comprising a plurality of instrumentation modes, each of the plurality of instrumentation modes corresponding to at least one of the plurality of profiling modes. Other embodiments are described.
-
公开(公告)号:US20190095309A1
公开(公告)日:2019-03-28
申请号:US15718435
申请日:2017-09-28
CPC分类号: G06F11/3409 , G06F11/302 , G06F11/3024 , G06F11/3466 , G06F11/3688 , G06F2201/865
摘要: Techniques and apparatus for performance analysis of a program are described. In one embodiment, for example, an apparatus may include at least one memory, and logic, at least a portion of comprised in hardware coupled to the at least one memory, to access a program for performance analysis, the program comprising at least one producer instruction and at least one consumer instruction for the at least one producer instruction, and generate an analysis program based on the program, the analysis program comprising a stall time instruction set to determine a stall time of the at least one producer instruction, the stall time instruction set comprising a first time stamp instruction immediately preceding a consumer instruction, a second time stamp instruction immediately following the consumer instruction, and a stall time instruction to determine the stall time as the difference between the second time stamp and the first time stamp. Other embodiments are described and claimed.
-
-