摘要:
One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.
摘要:
One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
摘要:
Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x−xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.
摘要翻译:多用途算术功能单元可以执行平面属性插值和一元函数近似运算。 在一个实施例中,通过计算A * x + B * y + C来执行坐标(x,y)的平面内插操作,并且通过计算F2(xb)* xh2 + F1(xb)来执行操作数x的一元函数近似运算 )* xh + F0(xb),其中xh = x-xb。 共享乘法器和加法器电路有利地用于实现两类操作的乘积和求和运算。
摘要:
A multipurpose functional unit is configurable to support a number of operations including multiply-add and format conversion operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and logical test operations.
摘要:
Methods and apparatuses are presented for determining coefficients for a polynomial-based approximation of a function, by iteratively estimating a first coefficient, reducing the first coefficient to a lower precision to obtain a first limited-precision coefficient, analytically calculating a second coefficient by taking into account the first limited-precision coefficient, reducing the second coefficient to a lower precision to obtain a second limited-precision coefficient, iteratively estimating a third coefficient by taking into account at least one of the first limited-precision coefficient and the second limited-precision coefficient, and reducing the third coefficient to a lower precision to obtain a third limited-precision coefficient. In one embodiment of the invention, the polynomial-based approximation relates to a minimax approximation of the function approximated, and at least one of the steps for iteratively estimating the first coefficient and iteratively estimating the third coefficient involves use of a Remez exchange algorithm.
摘要:
One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
摘要:
Methods, apparatuses, and systems are presented for performing instructions using multiple execution units in a graphics processing unit involving issuing an instruction for P executions of the instruction wherein each execution uses different data, P being a positive integer, the instruction being issued based on a first clock having a first clock rate, operating Q execution units to achieve the P executions of the instruction, Q being a positive integer less than P and greater than one, each of the execution units being operated based on a second clock having a second clock rate higher than the first clock rate of the first clock, and wherein the second clock rate of the second clock is equal to the first clock rate of the first clock multiplied by the ratio P/Q.
摘要:
A multipurpose functional unit is configurable to support a number of operations including multiply-add and comparison testing operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and format conversion operations.
摘要:
A multipurpose functional unit is configurable to support a number of operations including multiply-add and comparison testing operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and format conversion operations.
摘要:
An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.