Abstract:
The present invention facilitates efficient and effective utilization of unified virtual addresses across multiple components. In one embodiment, the presented new approach or solution uses Operating System (OS) allocation on the central processing unit (CPU) combined with graphics processing unit (GPU) driver mappings to provide a unified virtual address (VA) across both GPU and CPU. The new approach helps ensure that a GPU VA pointer does not collide with a CPU pointer provided by OS CPU allocation (e.g., like one returned by “malloc” C runtime API, etc.).
Abstract:
One embodiment sets forth a method for assigning priorities to kernels launched by a software application and executed within a stream of work on a parallel processing subsystem that supports dynamic parallelism. First, the software application assigns a maximum nesting depth for dynamic parallelism. The software application then assigns a stream priority to a stream. These assignments cause a driver to map the stream priority to a device priority and, subsequently, associate the device priority with the stream. As part of the mapping, the driver ensures that each device priority is at least the maximum nesting depth higher than the device priorities associated with any lower priority streams. Subsequently, the driver launches any kernel included in the stream with the device priority associated with the stream. Advantageously, by strategically assigning the maximum nesting depth and prioritizing streams, an application developer may increase the overall processing efficiency of the software application.
Abstract:
One embodiment sets forth a method for guiding the order in which a parallel processing subsystem executes memory copies. A driver creates semaphores for all but the lowest priority included in a plurality of priorities and associates one priority with each copy hardware channel included in the parallel processing subsystem. The driver then aliases prioritized streams to the copy hardware channels based on the priorities. Upon receiving a request to execute a memory copy within one of the streams, the driver inserts commands into the aliased copy hardware channel. These commands use the semaphores to direct the parallel processing subsystem to execute the memory copy based on the priority of the copy hardware channel. Advantageously, by assigning priorities to streams and, subsequently, strategically requesting memory copies within the prioritized streams, an application developer may fine-tune their software application to increase the overall processing efficiency of the software application.
Abstract:
A method for handling parallel processing clients associated with a server in a GPU, the method comprising: receiving a failure indication for at least client running a thread in the GPU; determining threads in the GPU associated with the failing client; exiting threads in the GPU associated with the failing client; and continuing to execute remaining threads in the GPU for other clients running threads in the GPU.
Abstract:
One embodiment sets forth a method for assigning priorities to kernels launched by a software application and executed within a stream of work on a parallel processing subsystem. First, the software application assigns a desired priority to a stream using a call included in the API. The API receives this call and passes it to a driver. The driver maps the desired priority to an appropriate device priority associated with the parallel processing subsystem. Subsequently, if the software application launches a particular kernel within the stream, then the driver assigns the device priority associated with the stream to the kernel before adding the kernel to the stream for execution on the parallel processing subsystem. Advantageously, by assigning priorities to streams and, subsequently, strategically launching kernels within the prioritized streams, an application developer may fine-tune the software application to increase the overall processing efficiency of the software application.