摘要:
One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
摘要:
One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.
摘要:
Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel.
摘要:
One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. Threads that exit a program are identified as idle by a disable mask. Other threads that are disabled may be enabled once the divergent threads reach an instruction that enables the disabled threads. Use of the disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture.
摘要:
An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
摘要:
A graphics pipeline system and associated method are provided with an integrated clipping operation. First included is a transform module positioned on a single semiconductor platform for transforming graphics data from a first space to a second space. Also provided is a lighting module positioned on the same single semiconductor platform as the transform module. The lighting module is adapted for performing lighting operations on the graphics data. A clipping operation is also performed utilizing the single semiconductor platform.
摘要:
A graphics pipeline system is provided with an integrated scissor operation. First provided is a transform module adapted for being coupled to a buffer to receive graphics data therefrom. Such transform module is positioned on a single semiconductor platform for transforming the graphics data from a first space to a second space. Associated therewith is a lighting module coupled to the transform module and positioned on the same single semiconductor platform as the transform module for performing lighting operations on the graphics data received from the transform module. A scissor operation is performed on the same single semiconductor platform as the transform module and the lighting module.
摘要:
A method, apparatus and article of manufacture are provided for sequencing graphics processing in a transform or lighting operation. A plurality of mode bits are first received which are indicative of the status of a plurality of modes of process operations. A plurality of addresses are then identified in memory based on the mode bits. Such addresses are then accessed in the memory for retrieving code segments which each are adapted to carry out the process operations in accordance with the status of the modes. The code segments are subsequently executed within a transform or lighting module for processing vertex data.
摘要:
A method, apparatus and article of manufacture are provided for managing vertex data in a vertex buffer. First, vertex data is received and stored in the vertex buffer. Thereafter, the vertex data is outputted from the vertex buffer to a processing module. During operation, a plurality of command bits is passed from the vertex buffer for determining a manner in which the vertex data is inputted and processed in the input buffer of the processing module. Such command bits are received from a command bit source. Further, a plurality of mode bits indicative of a status of a plurality of modes of process operations is passed. Such mode bits are received from a mode bit source. The mode bits are adapted for determining a manner in which the vertex data is processed in the processing module.
摘要:
A graphics pipeline system is provided for graphics processing. Such system includes a transform module adapted for being coupled to a vertex attribute buffer for receiving vertex data. The transform module serves to transform the vertex data from object space to screen space. Coupled to the transform module is a lighting module which is positioned on the single semiconductor platform for performing lighting operations on the vertex data received from the transform module. Also included is a rasterizer coupled to the lighting module and positioned on the single semiconductor platform for rendering the vertex data received from the lighting module.