摘要:
A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
摘要:
Some implementations provide systems, devices, and methods for implementing a cache replacement policy. A memory request is issued for attribute information associated with a node in an acceleration data structure. The attribute information associated with the node is inserted into a cache entry of the cache and an age associated with the cache entry is set to a value based on the attribute information, in response to the memory request causing a cache miss.
摘要:
Techniques for performing ray tracing operations are provided. The techniques include receiving a request to determine whether a ray intersects any primitive of a set of primitives, evaluating the ray against non-leaf nodes of a bounding volume hierarchy to determine whether to eliminate portions of the bounding volume hierarchy from consideration, evaluating the ray against at least one early-termination node not eliminated from consideration, and determining whether to terminate traversal of the bounding volume hierarchy early and to identify that the ray hits a primitive, based on the result of the evaluation of the ray against the at least one early-termination node.
摘要:
Shader resources may be specified for input to a shader using a hierarchical data structure which may be referred to as a descriptor set. The descriptor set may be bound to a bind point of the shader and may contain slots with pointers to memory containing shader resources. The shader may reference a particular slot of the descriptor set using an offset, and may change shader resources by referencing a different slot of the descriptor set or by binding or rebinding a new descriptor set. A graphics pipeline may be specified by creating a pipeline object which specifies a shader and a rendering context object, and linking the pipeline object. Part or all of the pipeline may be validated, cross-validated, or optimized during linking.
摘要:
Methods are provided for creating objects in a way that permits an API client to explicitly participate in memory management for an object created using the API. Methods for managing data object memory include requesting memory requirements for an object using an API and expressly allocating a memory location for the object based on the memory requirements. Methods are also provided for cloning objects such that a state of the object remains unchanged from the original object to the cloned object or can be explicitly specified.
摘要:
Systems, apparatuses, and methods for implementing a single-stream foveal display transport are disclosed. A system includes a transmitter sending an image over a display transport as a sequence of equi-sized rectangles to a receiver coupled to a display. The receiver then scales up the rectangles with different scale factors to cover display areas of different sizes. The pixel density within a rectangular region is uniform and scaling factors can take on integer or non-integer values. The rectilinear grid arrangement of the image results in simplified scaling operations for the display. In another scenario, the image is transmitted as a set of horizontal bands of equal size. Within each band, the same horizontal amount of transmitted pixels are redistributed across multiple rectangular regions of varied scales. The display stream includes embedded information and the horizontal and/or vertical distribution and scaling of rectangular regions, which can be adjusted for each transmitted image.
摘要:
Methods are provided for creating objects in a way that permits an API client to explicitly participate in memory management for an object created using the API. Methods for managing data object memory include requesting memory requirements for an object using an API and expressly allocating a memory location for the object based on the memory requirements. Methods are also provided for cloning objects such that a state of the object remains unchanged from the original object to the cloned object or can be explicitly specified.
摘要:
Shader resources may be specified for input to a shader using a hierarchical data structure which may be referred to as a descriptor set. The descriptor set may be bound to a bind point of the shader and may contain slots with pointers to memory containing shader resources. The shader may reference a particular slot of the descriptor set using an offset, and may change shader resources by referencing a different slot of the descriptor set or by binding or rebinding a new descriptor set. A graphics pipeline may be specified by creating a pipeline object which specifies a shader and a rendering context object, and linking the pipeline object. Part or all of the pipeline may be validated, cross-validated, or optimized during linking.
摘要:
Methods are provided for creating objects in a way that permits an API client to explicitly participate in memory management for an object created using the API. Methods for managing data object memory include requesting memory requirements for an object using an API and expressly allocating a memory location for the object based on the memory requirements. Methods are also provided for cloning objects such that a state of the object remains unchanged from the original object to the cloned object or can be explicitly specified.
摘要:
A method and apparatus are described for mapping a physical memory having different memory regions. A plurality of virtual non-uniform memory access (NUMA) nodes may be defined in system memory to represent memory segments of various performance characteristics. Memory segments of a high-bandwidth memory (HBM) system memory may be allocated to a first memory region of the physical memory having memory segments represented by a first one of the NUMA nodes. The physical memory may include a second memory region having memory segments represented by a second one of the NUMA nodes. Memory segments of system memory may be allocated to the second memory region. The physical memory may further include a third memory region having memory segments represented by a third one of the NUMA nodes. Memory segments of an interleaved uniform memory access (UMA) graphics memory may be allocated to the third memory region.