Abstract:
A system, method, and computer program product are provided for collecting trace information based on a computational workload. The method includes the steps of compiling source code to generate a program, launching a workload to be executed by the parallel processing unit, collecting one or more records of trace information associated with a plurality of threads configured to execute the program, and correlating the one or more records to one or more corresponding instructions included in the source code. Each record in the one or more records includes at least a value of a program counter and a scheduler state of the thread.
Abstract:
Apparatuses, systems, and techniques for hardware-driven call stack attribution. The apparatuses, systems, and techniques includes generating and updating call stacks within a processing device during execution of an application. In particular, determining a branch identifier associated with an instruction being executed by an execution thread, identifying a call stack identifier of the execution thread executing the instruction, and updating the call stack identifier of the execution thread based on the identified call stack identifier of the execution thread and the branch identifier.
Abstract:
A system, method, and computer program product are provided for collecting trace information based on a computational workload. The method includes the steps of compiling source code to generate a program, launching a workload to be executed by the parallel processing unit, collecting one or more records of trace information associated with a plurality of threads configured to execute the program, and correlating the one or more records to one or more corresponding instructions included in the source code. Each record in the one or more records includes at least a value of a program counter and a scheduler state of the thread.