Abstract:
A hardware-based traversal coprocessor provides acceleration of tree traversal operations searching for intersections between primitives represented in a tree data structure and a ray. The primitives may include triangles used in generating a virtual scene. The hardware-based traversal coprocessor is configured to properly handle numerically challenging computations at or near edges and/or vertices of primitives and/or ensure that a single intersection is reported when a ray intersects a surface formed by primitives at or near edges and/or vertices of the primitives.
Abstract:
An issue control unit is configured to control the rate at which an instruction issue unit issues instructions to an execution pipeline in order to avoid spikes in power drawn by that execution pipeline. The issue control unit maintains a history buffer that reflects, for N previous cycles, the number of instructions issued during each of those N cycles. If the total number of instructions issued during the N previous cycles exceeds a threshold value, then the issue control unit throttles the instruction issue unit from issuing instructions during a subsequent cycle. In addition, the issue control unit increases the threshold value in proportion to the number of previously issued instructions and based on a variety of configurable parameters. Accordingly, the issue control unit maintains granular control over the rate with which the instruction issue unit “ramps up” to a maximum instruction issue rate.
Abstract:
A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
Abstract:
A system, method, and computer program product are provided for processing primitive-specific attributes. A portion of a graphics processor is determined to operate in a fast geometry shader mode and a vertex associated with a set of per-vertex attributes is determined to be a shared vertex. The shared vertex is determined to be a non-provoking vertex corresponding to a first primitive that is associated with a first set of per-primitive attributes and the shared vertex is determined to be a provoking vertex corresponding to a second primitive that is associated with a second set of per-primitive attributes. Only one set of the per-vertex attributes associated with the shared vertex is stored and only one of the second set of per-primitive attributes associated with the second primitive is stored.
Abstract:
A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
Abstract:
A system, method, and computer program product are provided for processing primitive-specific attributes. A portion of a graphics processor is determined to operate in a fast geometry shader mode and a vertex associated with a set of per-vertex attributes is determined to be a shared vertex. The shared vertex is determined to be a non-provoking vertex corresponding to a first primitive that is associated with a first set of per-primitive attributes and the shared vertex is determined to be a provoking vertex corresponding to a second primitive that is associated with a second set of per-primitive attributes. Only one set of the per-vertex attributes associated with the shared vertex is stored and only one of the second set of per-primitive attributes associated with the second primitive is stored.