Abstract:
Apparatus and method for scalable content protection. For example, one embodiment of an apparatus comprises: cryptographic management circuitry to securely store one or more keys associated with one or more media apps/applications; a plurality of processing engines, each processing engine comprising circuitry to process media content of the one or more media apps/applications; and a scheduler to schedule processing of the media content by the processing engines; wherein the cryptographic management circuitry is to restore a first cryptographic state including a first key associated with a first media app/application and/or first media content responsive to a request to process the first media content on a first processing engine.
Abstract:
In an embodiment, a system includes a graphics processing unit (GPU) that includes one or more GPU engines, and a microcontroller. The microcontroller is to assign a respective schedule slot for each of a plurality of virtual machines (VMs). When a particular VM is scheduled to access a first GPU engine, the particular VM has exclusive access to the first GPU engine. Other embodiments are described and claimed.
Abstract:
Described herein is a paging technique that can be implemented in any accelerator with attached memory and support for operating on encrypted data when the CPU is not within the trusted compute base (TCB). Memory storing data that is encrypted using hardware physical address (HPA)-based encrypted can be paged out of accelerator device memory by decoupling encryption from the hardware physical address and re-encrypting the data for page-out. Upon page-in, the data is decrypted, the integrity and authenticity of the data is verified, then the data is re-encrypted using HPA-based encryption.
Abstract:
Techniques are disclosed for processing rendering engine workload of a graphics system in a secure fashion, wherein at least some security processing of the workload is offloaded from software-based security parsing to hardware-based security parsing. In some embodiments, commands from a given application are received by a user mode driver (UMD), which is configured to generate a command buffer delineated into privileged and/or non-privileged command sections. The delineated command buffer can then be passed by the UMD to a kernel-mode driver (KMD), which is configured to parse and validate only privileged buffer sections, but to issue all other batch buffers with a privilege indicator set to non-privileged. A graphics processing unit (GPU) can receive the privilege-designated batch buffers from the KMD, and is configured to disallow execution of any privileged command from a non-privileged batch buffer, while any privileged commands from privileged batch buffers are unrestricted by the GPU
Abstract:
Memory-based semaphores are described that are useful for synchronizing processes between different processing engines. In one example, operations include executing a first process at a first processing engine, the executing including updating a memory register, sending a signal from the first processing engine to a second processing engine that the memory register has been updated, the signal including a memory register address to identify the updated memory register inline data and a dataword, fetching data from the memory register by the second processing engine, comparing the fetched data to the received dataword, and conditionally executing a next command of a second process at the second processing engine based on the comparison.
Abstract:
Apparatus and method for scalable error reporting. For example, one embodiment of an apparatus comprises error detection circuitry to detect an error in a component of a first tile within a tile-based hierarchy of a processing device; error classification circuitry to classify the error and record first error data based on the classification; a first tile interface to combine the first error data with second error data received from one or more other components associated with the first tile to generate first accumulated error data; and a master tile interface to combine the first accumulated error data with second accumulated error data received from at least one other tile interface to generate second accumulated error data and to provide the second accumulated error data to a host executing an application to process the second accumulated error data.
Abstract:
Apparatus and method for simultaneous command streamers. For example, one embodiment of an apparatus comprises: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plurality of work element queues, the command streamers to independently submit instructions for execution as specified by the work elements; a thread dispatcher to evaluate the thread contexts including priority values, to tag each instruction with an execution identifier (ID), and to responsively dispatch each instruction including the execution ID in accordance with the thread context; and a plurality of graphics functional units to independently execute each instruction dispatched by the thread dispatcher and to associate each instruction with a thread context based on its execution ID.
Abstract:
A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a register file having a plurality of channels to store data and an execution unit to examine data at each of the plurality of channels, read a data value from a first of the plurality of channels upon a determination that each of the plurality of channels has the same data and execute a single input multi data (SIMD) instruction based on the data value.
Abstract:
Methods and apparatus relating to mid-thread pre-emption with software assisted context switch are described. In an embodiment, one or more threads executing on a Graphics Processing Unit (GPU) are stopped at an instruction level granularity in response to a request to pre-empt the one or more threads. The context data of the one or more threads is copied to memory in response to completion of the one or more threads at the instruction level granularity and/or one or more instructions. Other embodiments are also disclosed and claimed.