Abstract:
Apparatus and method for context-aware compression. For example, one embodiment of an apparatus comprises: ray traversal/intersection circuitry to traverse rays through a hierarchical acceleration data structure to identify intersections between rays and primitives of a graphics scene; matrix compression circuitry/logic to compress hierarchical transformation matrices to generate compressed hierarchical transformation matrices by quantizing N-bit floating point data elements associated with child transforms of the hierarchical transformation matrices to variable-bit floating point numbers or integers comprising offsets from a parent transform of the child transform; and an instance processor to generate a plurality of instances of one or more base geometric objects in accordance with the compressed hierarchical transformation matrices.
Abstract:
Embodiments described herein provide data processing device comprising a processor, a memory, and a large draw monitor comprising a processing unit to determine whether a vertex count for a graphics workload exceeds a threshold value, and in response to a determination that the vertex count for the graphics workload exceeds the threshold value, to divide the graphics workload over graphics processing units instantiated on multiple separate tiles. Other embodiments may be described and claimed.
Abstract:
A mechanism is described for facilitating efficient graphics command generation and execution for improved graphics performance at computing devices. A method of embodiments, as described herein, includes detecting an application programming interface (API) call to perform a plurality of transactions, where the API call is issued by an application at a first command buffer, where the plurality of transactions include a first set of transactions and a second set of transactions. The method may further include creating a second command buffer and appending the second command buffer to the first command buffer, where creating further includes separating the first set transactions from the second set of transactions. The method may further include executing, via the second command buffer, the first set of transactions, prior to executing the first set of transactions.
Abstract:
A mechanism is described for facilitating adaptive resolution and viewpoint-prediction for immersive media in computing environments. An apparatus of embodiments, as described herein, includes one or more processors to receive viewing positions associated with a user with respect to a display, and analyze relevance of media contents based on the viewing positions, where the media content includes immersive videos of scenes captured by one or more cameras. The one or more processors are further to predict portions of the media contents as relevant portions based on the viewing positions and transmit the relevant portions to be rendered and displayed.
Abstract:
Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.
Abstract:
Graphics processors for implementing multi-tile memory management are disclosed. In one embodiment, a graphics processor includes a first graphics device having a local memory, a second graphics device having a local memory, and a graphics driver to provide a single virtual allocation with a common virtual address range to mirror a resource to each local memory of the first and second graphics devices.
Abstract:
Apparatus and method for bottom-up BVH refit. For example, one embodiment of an apparatus comprises: a hierarchical acceleration data structure generator to construct an acceleration data structure comprising a plurality of hierarchically arranged nodes; traversal hardware logic to traverse one or more rays through the acceleration data structure; intersection hardware logic to determine intersections between the one or more rays and one or more primitives within the hierarchical acceleration data structure; a node unit comprising circuitry and/or logic to perform refit operations on nodes of the hierarchical acceleration data structure, the refit operations to adjust spatial dimensions of one or more of the nodes; and an early termination evaluator to determine whether to proceed with refit operations or to terminate refit operations for a current node based on refit data associated with one or more child nodes of the current node.
Abstract:
A mechanism is described for facilitating efficient processing of graphics commands at computing devices. A method of embodiments, as described herein, includes detecting a current object representing a bundled state of graphics commands in a command list to be processed at a graphics processor of a computing device, and evaluating the current object to determine a previous object bound to a first set of the graphics commands, where the first set of the graphics commands is associated with a first command state corresponding to the previous object. The method may further include copying a second set of the graphics commands to a command buffer associated with the command list, where the second set of the graphics commands represents a remainder of the graphics commands in the command list upon excluding the first set of the graphics commands. The method may further include facilitating the graphics processor to execute the second set of the graphics commands from the command buffer.
Abstract:
Embodiments described herein provided for an instruction and associated logic to enable a processing resource including a tensor accelerator to perform optimized computation of sparse submatrix operations. One embodiment provides a parallel processor comprising a processing cluster coupled with the cache memory. The processing cluster includes a plurality of multiprocessors coupled with a data interconnect, where a multiprocessor of the plurality of multiprocessors includes a tensor core configured to load tensor data and metadata associated with the tensor data from the cache memory, wherein the metadata indicates a first numerical transform applied to the tensor data, perform an inverse transform of the first numerical transform, perform a tensor operation on the tensor data after the inverse transform is performed, and write output of the tensor operation to a memory coupled with the processing cluster.
Abstract:
Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.