Abstract:
An apparatus and method are described for efficiently rendering an transmitting to a remote display. For example, one embodiment of a remote display apparatus comprises: a display engine to render a sequence of video images; an encoder to compress the sequence of video images to generate a sequence of compressed video images; a network interface controller to transmit the compressed video images over a network link to a remote display; a plurality of buffer pointer registers to store read pointers and write pointers identifying read locations and write locations, respectively, in a frame buffer and a compressed stream buffer; a central processing unit (CPU) to initialize the read pointers and write pointers for processing one or more of the video images; and the display engine to access a first write pointer to write to a specified location in the frame buffer, the encoder to begin reading from the frame buffer based on a first read pointer value, the encoder to write to the compressed stream buffer based on a second write pointer value, and the network interface controller to read from the compressed stream buffer based on a second read pointer value, the first and second write and read pointer values to be updated without intervention from the CPU as the display engine writes to the frame buffer, the encoder reads from the frame buffer and writes to the compressed stream buffer, and the network interface controller reads from the compressed stream buffer.
Abstract:
A mechanism for command stream processing is described. A method of embodiments, as described herein, includes fetching cache lines from a memory to fill command first in first out buffer (FIFO), wherein the fetched cachelines an overfetching of data necessary to process a command, a first parser to fetch and execute batch commands stored in the command FIFO and a second parser to fetch commands and execute the batch commands and non-batch commands stored in the command FIFO.
Abstract:
Apparatuses, methods and storage medium associated with single pass parallel encryption are disclosed herein. In embodiments, an apparatus for computing may comprise an encryption engine to encrypt a video stream. The encryption engine may comprise a plurality of encryption pipelines to respectively encrypt a plurality of video sub-streams partitioned from the video stream in parallel in a single pass as the video sub-streams are being generated. The plurality of encryption pipelines may use a corresponding plurality of multi-part encryption counters to encrypt the corresponding video sub-streams as the video sub-streams are being generated. Each of the multi-part encryption counters used by one of the encryption pipelines may comprise a sub-portion that remains constant while encoding the corresponding video sub-stream, but the sub-key is unique for the one encryption pipeline, and differs from corresponding sub-portions of the multi-part encryption counters used by the other encryption pipelines. Other embodiments may be disclosed or claimed.
Abstract:
In an embodiment, a system includes a graphics processing unit (GPU) that includes one or more GPU engines, and a microcontroller. The microcontroller is to assign a respective schedule slot for each of a plurality of virtual machines (VMs). When a particular VM is scheduled to access a first GPU engine, the particular VM has exclusive access to the first GPU engine. Other embodiments are described and claimed.
Abstract:
An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
Abstract:
A mechanism is described for facilitating localized load-balancing for processors in computing devices. A method of embodiments, as described herein, includes facilitating hosting, at a processor of a computing device, a local load-balancing mechanism. The method may further include monitoring balancing of loads at the processor and serving as a local scheduler to maintain de-centralized load-balancing at the processor and between the processor and other one or more processors.
Abstract:
One embodiment provides for a general-purpose graphics processing unit comprising multiple processing units and a pipeline manager to distribute a thread group to the multiple processing units, wherein the pipeline manager is to distribute the thread group as multiple thread sub-groups.
Abstract:
An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.
Abstract:
In accordance with some embodiments, a command streamer may use a cache of programmable size to cache commands to improve memory bandwidth and reduce latency. The size of the command cache may be programmably set by the command streamer.