Abstract:
A mechanism is described for facilitating efficient scheduling of graphics workloads at computing devices. A method of embodiments, as described herein, includes receiving a work request for processing a work item at a graphics processor, where the work request is placed by an application. The method may further include allowing the application to directly call into a graphics driver associated with the graphics processor to generate a work queue for the work item, where direct calling allows the application to bypass an intermediary call to the graphics driver and directly submit the work item to the graphics processor, where direct calling further includes notifying the graphics processor of the work unit by writing into a memory location monitored by the graphics processor. The method may further include submitting the work item from the work queue to a submit queue of a plurality of submit queues, where one or more tasks associated with the work item are processed at the graphics processor.
Abstract:
Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
Abstract:
Methods and apparatus relating to transactional page fault handling. In an example, an apparatus comprises a processor to divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically, initiate the execution of the thread, and manage the execution of the thread according to one of a first protocol in response to a determination that a page fault occurred in the execution of a transaction, or a second protocol in response to a determination that a page fault did not occur in the execution of a transaction. Other embodiments are also disclosed and claimed.
Abstract:
An example system for generating hardware device input includes a gesture detector to detect an input device trigger from one of two coupled touch-enabled displays. The example system also further includes a redirector to intercept touch data from a triggered touch-enabled display. The example system further includes an emulator to generate hardware input data based on the intercepted touch data and send the hardware input data to an operating system. The example system also includes a user interface to display a virtual input device on the triggered touch-enabled display and receive touch data via the virtual input device.
Abstract:
A computing device for performing scheduling operations for graphics hardware is described herein. The computing device includes a central processing unit (CPU) that is configured to execute an application. The computing device also includes a graphics scheduler configured to operate independently of the CPU. The graphics scheduler is configured to receive work queues relating to workloads from the application that are to execute on the CPU and perform scheduling operations for any of a number of graphics engines based on the work queues.
Abstract:
Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
Abstract:
Methods and apparatus relating to predictive page fault handling. In an example, an apparatus comprises a processor to receive a virtual address that triggered a page fault for a compute process, check a virtual memory space for a virtual memory allocation for the compute process that triggered the page fault and manage the page fault according to one of a first protocol in response to a determination that the virtual address that triggered the page fault is a last page in the virtual memory allocation for the compute process, or a second protocol in response to a determination that the virtual address that triggered the page fault is not a last page in the virtual memory allocation for the compute process. Other embodiments are also disclosed and claimed.
Abstract:
Methods and apparatus relating to advanced graphics Power State management are described. In one embodiment, measurement logic detects information about idle transitions and active transitions of a power-well of a processor. In turn, determination logic determines performance loss and/or energy gain based at least in part on the detected information and power-on latency of the power-well of the processor. Other embodiments are also disclosed and claimed.
Abstract:
A virtual-to-virtual page table maps a main surface containing the actual data and a metadata or auxiliary surface that gives information about compression of the main surface. In order to access the metadata that corresponds to main surface, an additional virtual-to-virtual table may be used ahead of the regular page table mapping to avoid the need to pass the metadata base address and x, y coordinates across a pipeline which may result in multiple memory writes.
Abstract:
By scheduling/managing workload submission to a POSH pipe one can exploit parallelism with minimum impact to the software scheduler in some embodiments. Software separates command sequences for each pipe to enable the POSH pipe to run ahead of a Render pipe. Infrastructre is provided to synchronize the two pipes through software inserted commands. A Render plus POSH pipeline may be a single monolithic engine without changes to a software scheduler, removing the complexity and the latencies involved in scheduling.