Abstract:
An apparatus to facilitate graphics rendering is disclosed. The apparatus comprises sequencer hardware to operate in a tile mode to render objects, including performing batch formation to generate one or more batches of received objects, performing tile sequencing for each of the objects to compute tile fill intersects for each of the objects and performing a play sequencing of each of the objects.
Abstract:
Apparatus and method for scalable error reporting. For example, one embodiment of an apparatus comprises error detection circuitry to detect an error in a component of a first tile within a tile-based hierarchy of a processing device; error classification circuitry to classify the error and record first error data based on the classification; a first tile interface to combine the first error data with second error data received from one or more other components associated with the first tile to generate first accumulated error data; and a master tile interface to combine the first accumulated error data with second accumulated error data received from at least one other tile interface to generate second accumulated error data and to provide the second accumulated error data to a host executing an application to process the second accumulated error data.
Abstract:
Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
Abstract:
Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
Abstract:
An apparatus to facilitate graphics rendering is disclosed. The apparatus comprises sequencer hardware to operate in a tile mode to render objects, including performing batch formation to generate one or more batches of received objects, performing tile sequencing for each of the objects to compute tile fill intersects for each of the objects and performing a play sequencing of each of the objects.
Abstract:
Described herein is an accelerator device in which compaction of diverged lanes of a parallel processor is enabled to increase the efficiency of ALU utilization. One embodiment provides an accelerator device comprising a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including a parallel processing architecture configured to enable compaction of diverged lanes.
Abstract:
An apparatus to facilitate doorbell notifications is disclosed. The apparatus includes memory-mapped I/O (MMIO) base address registers including a physical function (PF) and plurality of virtual functions (VF), wherein each function's base address register comprises a plurality of doorbell pages and doorbell hardware including doorbell registers, each having an assignable function identifier (ID) and offset, and comprising a plurality of doorbells to activate a doorbell notification in response to receiving a doorbell trigger from an associated doorbell page set upon detection of an access request.
Abstract:
Described herein are technologies related to a ensuring that graphics commands and graphics context are offloading and scheduled for consumption as the commands and graphics context are sent from coherent to non-coherent memory/fabric in a “processor to processor” handoff or transaction.
Abstract:
Methods and systems may provide for executing, by a physically distributed set of compute slices, a plurality of work items. Additionally, the coherency of one or more memory lines associated with the plurality of work items may be maintained, by a cache fabric, across a graphics processor, a system memory and one or more host processors. In one example, a plurality of crossbar nodes track the one or more memory lines, wherein the coherency of the one or more memory lines is maintained across a plurality of level one (L1) caches and a physically distributed cache structure. Each L1 cache may be dedicated to an execution block of a compute slice and each crossbar node may be dedicated to a compute slice.
Abstract:
Technologies are presented that allow a portion of a cache to be used as a front memory when there is dynamic need based on system demand. A computing system may include at least one processor, a memory controlled by a controller and communicatively coupled with the at least one processor, a cache communicatively coupled with the at least one processor and the memory, and mapping logic communicatively coupled with the at least one processor, the memory, and the cache. The mapping logic may map a portion of the cache to a portion of the memory, wherein the portion of the cache is to be used by the at least one processor as a local memory, and wherein the mapping is dynamic based on system demand and managed by the controller in a physical address domain.