摘要:
A method and apparatus are provided for performing efficient conversion operations between floating point and fixed point values on a general purpose processor. This is achieved by providing an instruction for converting a fixed point value fx into a floating point value fl in a general purpose processor. Accordingly, the invention advantageously provides a general purpose processor with the ability to execute conversion operation between fixed-point and floating-point values with a single instruction compared with prior art general purpose processors that require multiple instructions to perform the same function. Thus, the general purpose processor of the present invention allows for more efficient and faster conversion operations between fixed-point and floating-point values.
摘要:
A processor performs precise trap handling for out-of-order and speculative load instructions. It keeps track of the age of load instructions in a shared scheme that includes a load buffer and a load annex. All precise exceptions are detected in a T phase of a load pipeline. Data and control information concerning load operations that hit in the data cache are staged in a load annex during the A1, A2, A3, and T pipeline stages until all exceptions in the same or earlier instruction packet are detected. Data and control information from all other load instructions is staged in the load annex after the load data is retrieved. Before the load data is retrieved, the load instruction is kept in a load buffer. If an exception occurs, any load in the same instruction packet as the instruction causing the exception is canceled. Any load instructions that are “younger” than the instruction that caused the exception are also canceled. The age of load instructions is determined by tracking the pipe stages of the instruction. When a trap occurs, any load instruction with a non-zero age indicator is canceled.
摘要:
A method and apparatus for performing single-instruction bit field extraction and for counting a number of leading zeros in a sequence of bits on a general purpose processor are provided. The fast bit extraction operations are accomplished by executing a first instruction for extracting an arbitrary number of bits of a sequence of bits stored in two or more source registers of the processor starting at an arbitrary offset and the storing the extracted bits in a destination register. Both the source and the destination registers are specified by the instruction. In addition, a second instruction is provided for counting the number of leading zeros in a sequence of bits stored in two or more source registers of the processor and then storing a binary value representing the number of leading zeros in a destination register. Again the source and the destination registers are specified by the second instruction. Both the first and the second instructions are pipelined to obtain an effective throughput of one instruction every cycle, respectively. As a result, bit extraction operations are performed very efficiently by the processor, thereby reducing the overall processing time required to compress and decompress multimedia data.
摘要:
A method and apparatus for performing fast clip-testing operations in a general purpose processor are provided. This is accomplished by executing a single instruction for comparing a first value x to a second value y and, as a result of the comparison, determining whether x is less than y and whether x is less than negative y. The values x and y are stored in respective source registers of the processor specified by the instruction. Finally, as a result of the determination, one or more binary values representing the results of the determination are inserted into a destination register of the processor also specified by the instruction. Accordingly, the invention advantageously provides a general purpose processor with the ability to execute a clip-testing function with a single instruction compared with prior art general purpose processors that require multiple instructions to perform the same function. Thus, the general purpose processor of the present invention allows for more efficient and faster clip-testing operations.
摘要:
Incoming geometry data are buffered in one or more buffers. The data are written to the buffers in an order which is not necessarily the order in which a processor or processors that construct images from the data need the data for fast processing. The data are provided to the processors in the order needed for fast processing. In some embodiments, fast processing involves starting critical path computations early. Examples of critical path computations are lighting computations which take more time than position computations. At least one processor has a pipelined instruction execution unit. The processor executes critical path computation instructions as long as a critical path instruction can be started without causing a pipeline stall. When no critical path instructions can be started without causing a stall, the processor starts a non-critical path instruction.
摘要:
When an atomic instruction executed by a computer processor locks a memory location, the locking is performed before the processor has determined whether the instruction is to be executed to completion or canceled. The memory location is unlocked whether or not the instruction will be canceled. Since the locking operation can occur before it is known whether the instruction will be canceled, the reading of the memory location can also occur early, before it is known whether the instruction will be canceled.
摘要:
A method and apparatus for efficiently performing graphic operations are provided. This is accomplished by providing a processor that supports any combination of the following instructions: parallel multiply-add, conditional pick, parallel averaging, parallel power, parallel reciprocal square root and parallel shifts.
摘要:
Two instruction executions circuits C1 and C2, possibly pipelined, share a write port to write instruction results to their destinations. When both circuits have results available for writing in the same clock cycle, the write port is given to circuit C1. Circuit C2 gets the write port only when there is a bubble in the write back stage of circuit C1. Circuit C2 executes instructions that occur infrequently in an average program. Examples are division, reciprocal square root, and power computation instructions. Circuit C1 executes instructions that occur more frequently. Circuits C1 and C2 are part of a functional unit of a VLIW processor.