摘要:
An instruction dispatch circuit is disclosed that improves instruction execution throughput for a processor. The instruction dispatch circuit comprises an instruction buffer with a plurality of instruction entries and a content addressable memory array having at least one cam entry corresponding to each instruction entry. Each cam entry stores at least one source tag for the corresponding instruction entry. The content addressable memory array matches to a result tag from an execution circuit over a result bus, wherein the execution circuit transfers the result tag over the result bus at least one clock cycle before transferring a corresponding result data value over the result bus. Each cam entry generates a cam match signal used to determine whether data dependent instruction are ready for dispatch.
摘要:
A mechanism for coordinating source data in a processor, wherein a decode circuit issues instructions comprising at least one immediate valid flag and at least one logical register source. The immediate valid flag indicates whether an immediate operand for the instruction is available on an immediate data bus, and the logical register source specifies a physical register or a committed state register. A speculative result data value and a speculative source valid flag are read from the physical register, and a committed result data value is read from the committed state register. The speculative result data value and the speculative source valid flag or the committed result data value and the committed source valid flag provide a source data value and a source data valid flag for scheduling an execution of the instruction.
摘要:
A method and apparatus for performing operations with a processor in a computer system. Load operations are performed by use of a dispatch pipeline and a memory execution pipeline. The dispatch pipeline dispatches the load operation for execution by the processor, while the memory execution pipeline controls the execution of the load operation to memory. The present invention reduces the latency involved in executing a load operation by coupling the execution of the two pipelines during execution of the load operation.
摘要:
A mechanism and method for renaming flags within a register alias table ("RAT") to increase processor parallelism and also providing and using flag masks associated with individual instructions. In order to reduce the amount of data dependencies between instructions that are concurrently processed, the flags used by these instructions are renamed. In general, a RAT unit provides register renaming to provide a larger physical register set than would ordinarily be available within a given macroarchitecture's logical register set (such as the Intel architecture or PowerPC or Alpha designs, for instance) to eliminate false data dependencies between instructions that reduce overall superscalar processing performance for the microprocessor. The renamed flag registers contain several flag bits and various flag bits may be updated or read by different instructions. Also, static and dynamic flag masks are associated with particular instructions and indicate which flags are capable of being updated by a particular instruction and also indicate which flags are actually updated by the instruction. Static flag masks are used in flag renaming and dynamic flag masks are used at retirement. The invention also discovers cases in which a flag register is required that is a superset of the previously renamed flag register portion.
摘要:
Maximum throughput or "back-to-back" scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the prospective determination of the availability of a source operand before the operand itself is actually computed as a result of the execution of another instruction. Storage addresses of the source operands of an instruction are stored in a content addressable memory (CAM). Before an instruction is executed and its result data written back, the storage location address of the result is provided to the CAM and associatively compared with the source operand addresses stored therein. A CAM match and its accompanying match bit indicate that the result of the instruction to be executed will provide a source operand to the dependent instruction waiting in the reservation station. Using a bypass mechanism, if the operand is computed after dispatch of the dependent instruction, then the source operand is provided directly from the execution unit computing the source operand to a source operand input of the execution unit executing the dependent instruction.
摘要:
A method and apparatus for performing store operations that includes calculating the address and obtaining the data for the store operation. The address represents the memory location to which the data is to be stored. Once the address is calculated and the data obtained, the store operation is committed to processor state. The store operation may be dispatched to memory to complete the execution of the store operation.
摘要:
In a pipelined processor, an apparatus for handling string operations. When a string operation is received by the processor, the length of the string as specified by the programmer is stored in a register. Next, an instruction sequencer issues an instruction that computes the register value minus a pre-determined number of iterations to be issued into the pipeline. Following the instruction, the pre-determined number of iterations are issued to the pipeline. When the instruction returns with the calculated number, the instruction sequencer then knows exactly how many iterations should be executed. Any extra iterations that had initially been issued are canceled by the execution unit, and additional iterations are issued as necessary. A loop counter in the instruction sequencer is used to track the number of iterations.
摘要:
A Branch Target Buffer Circuit in a computer processor that predicts branch instructions with a stream of computer instructions is disclosed. The Branch Target Buffer Circuit uses a Branch Target Buffer Cache that stores branch information about previously executed branch instructions. The branch information stored in the Branch Target Buffer Cache is addressed by the last byte of each branch instruction. When an Instruction Fetch Unit in the computer processor fetches a block of instructions it sends the Branch Target Buffer Circuit an instruction pointer. Based on the instruction pointer, the Branch Target Buffer Circuit looks in the Branch Target Buffer Cache to see if any of the instructions in the block being fetched is a branch instruction. When the Branch Target Buffer Circuit finds an upcoming branch instruction in the Branch Target Buffer Cache, the Branch Target Buffer Circuit informs an Instruction Fetch Unit about the upcoming branch instruction.
摘要:
Maximum throughput or “back-to-back” scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through a number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the early setting of a source valid bit during allocation when a source operand is a retired or immediate value. This allows the ready logic of a reservation station to begin scheduling the instruction for dispatch.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.