摘要:
In a pipelined processor, an apparatus for handling string operations. When a string operation is received by the processor, the length of the string as specified by the programmer is stored in a register. Next, an instruction sequencer issues an instruction that computes the register value minus a pre-determined number of iterations to be issued into the pipeline. Following the instruction, the pre-determined number of iterations are issued to the pipeline. When the instruction returns with the calculated number, the instruction sequencer then knows exactly how many iterations should be executed. Any extra iterations that had initially been issued are canceled by the execution unit, and additional iterations are issued as necessary. A loop counter in the instruction sequencer is used to track the number of iterations.
摘要:
A method and apparatus for resolving Return From Subroutine instructions in a computer processor are disclosed. The method and apparatus resolve Return From Subroutine instructions in four stages. A first stage predicts Call Subroutine instructions and Return From Subroutine instructions within the instruction stream. The first stage stores a return address in a return register when a Call Subroutine instruction is predicted. The first stage predicts a return to the return address in the return register when a Return From Subroutine instruction is predicted. A second stage decodes each Call Subroutine and Return From Subroutine instruction in order to maintain a Return Stack Buffer that stores a stack of return addresses. Each time the second stage decodes a Call Subroutine instruction, a return address is pushed onto the Return Stack Buffer. Correspondingly, each time the second stage decodes a Return From Subroutine instruction, a return address is popped off of the Return Stack Buffer. The second stage verifies predictions made by the first stage and predicts return addresses for Return From Subroutine instructions that were not predicted by the first stage. A third stage executes Return From Subroutine instructions such that the predictions are verified. Finally, a fourth stage retires Return From Subroutine instructions and ensures that no instructions fetch after a mispredicted return address are committed into permanent state.
摘要:
A Branch Target Buffer Circuit in a computer processor that predicts branch instructions with a stream of computer instructions is disclosed. The Branch Target Buffer Circuit uses a Branch Target Buffer Cache that stores branch information about previously executed branch instructions. The branch information stored in the Branch Target Buffer Cache is addressed by the last byte of each branch instruction. When an Instruction Fetch Unit in the computer processor fetches a block of instructions it sends the Branch Target Buffer Circuit an instruction pointer. Based on the instruction pointer, the Branch Target Buffer Circuit looks in the Branch Target Buffer Cache to see if any of the instructions in the block being fetched is a branch instruction. When the Branch Target Buffer Circuit finds an upcoming branch instruction in the Branch Target Buffer Cache, the Branch Target Buffer Circuit informs an Instruction Fetch Unit about the upcoming branch instruction.
摘要:
Maximum throughput or “back-to-back” scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through a number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the early setting of a source valid bit during allocation when a source operand is a retired or immediate value. This allows the ready logic of a reservation station to begin scheduling the instruction for dispatch.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.
摘要:
Maximum throughput or "back-to-back" scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through a number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the early setting of a source valid bit during allocation when a source operand is a retired or immediate value. This allows the ready logic of a reservation station to begin scheduling the instruction for dispatch.
摘要:
A method and apparatus for resolving Return From Subroutine instructions in a computer processor are disclosed. The method and apparatus resolve Return From Subroutine instructions in four stages. A first stage predicts Call Subroutine instructions and Return From Subroutine instructions within the instruction stream. The first stage stores a return address in a return register when a Call Subroutine instruction is predicted. The first stage predicts a return to the return address in the return register when a Return From Subroutine instruction is predicted. A second stage decodes each Call Subroutine and Return From Subroutine instruction in order to maintain a Return Stack Buffer that stores a stack of return addresses. Each time the second stage decodes a Call Subroutine instruction, a return address is pushed onto the Return Stack Buffer. Correspondingly, each time the second stage decodes a Return From Subroutine instruction, a return address is popped off of the Return Stack Buffer. The second stage verifies predictions made by the first stage and predicts return addresses for Return From Subroutine instructions that were not predicted by the first stage. A third stage executes Return From Subroutine instructions such that the predictions are verified. Finally, a fourth stage retires Return From Subroutine instructions and ensures that no instructions fetch after a mispredicted return address are committed into permanent state.
摘要:
A buffer is used to store information about the branch instructions within a pipelined microprocessor that can speculatively execute instructions. When a branch instruction in the microprocessor is decoded, the address of the instruction immediately following the branch instruction (the Next Linear Instruction Pointer or NLIP) and some processor state information is written into a Branch Instruction Pointer Table. The branch instruction then proceeds down the microprocessor pipeline. Eventually, the branch instruction is executed. The resolved branch outcome for the branch instruction is compared with a predicted branch outcome. If the branch prediction was correct, the microprocessor continues execution along the current path. However, if the branch prediction was wrong then the execution unit flushes the front-end microprocessor pipeline and restores the microprocessor state information that was stored in the Branch IP Table. If the branch was mispredicted as not taken, the execution unit instructs an Instruction Fetch Unit to resume execution at a final branch target address. Alternatively, if the branch was mispredicted as taken when the branch should not have been taken, the execution unit instructs the Instruction Fetch Unit to resume execution at the Next Linear Instruction Pointer (NLIP) address stored in the Branch IP Table.