摘要:
Maximum throughput or "back-to-back" scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the prospective determination of the availability of a source operand before the operand itself is actually computed as a result of the execution of another instruction. Storage addresses of the source operands of an instruction are stored in a content addressable memory (CAM). Before an instruction is executed and its result data written back, the storage location address of the result is provided to the CAM and associatively compared with the source operand addresses stored therein. A CAM match and its accompanying match bit indicate that the result of the instruction to be executed will provide a source operand to the dependent instruction waiting in the reservation station. Using a bypass mechanism, if the operand is computed after dispatch of the dependent instruction, then the source operand is provided directly from the execution unit computing the source operand to a source operand input of the execution unit executing the dependent instruction.
摘要:
Pipeline lengthening in functional units likely to be involved in a writeback conflict is implemented to avoid conflicts. Logic circuitry is provided for comparing the depths of two concurrently executing execution unit pipelines to determine if a conflict will develop. When it appears that two execution units will attempt to write back at the same time, the execution unit having a shorter pipeline will be instructed to add a stage to its pipeline, storing its result in a delaying buffer for one clock cycle. After the conflict has been resolved, the instruction to lengthen the pipeline of a given functional unit will be rescinded. Multistage execution units are designed to signal a reservation station to delay the dispatch of various instructions to avoid conflicts between execution units.
摘要:
Maximum throughput or “back-to-back” scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through a number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the early setting of a source valid bit during allocation when a source operand is a retired or immediate value. This allows the ready logic of a reservation station to begin scheduling the instruction for dispatch.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.
摘要:
An out-of-order execution processor comprising an execution unit, a storage unit and a scheduler is disclosed. The storage unit stores instructions awaiting availability of resources required for execution. The scheduler periodically determines whether resources required for executing each instruction are available, and if so, dispatches that instruction to the execution unit. The execution unit indicates future availability of hardware resources such as functional units and write back ports a number of clock cycles before actual availability of the hardware resources. The scheduler determines availability of resources required for execution of an instruction based on the indication of future availability of the hardware resources, and dispatched the instruction for execution. The out-of-order execution processor also includes means to determine future completion of execution of source instructions a number of clock cycles before actual completion of execution. The scheduler dispatches for execution a data-dependent instruction that requires an execution result of one of such source instructions for an operand. Once the execution result of the source instruction is available, a bypass multiplexor bypasses the execution result into the dispatched data-dependent instruction. The bypass multiplexor sends the data dependent instruction with fully assembled operands to the execution unit for execution.
摘要:
Maximum throughput or "back-to-back" scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through a number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the early setting of a source valid bit during allocation when a source operand is a retired or immediate value. This allows the ready logic of a reservation station to begin scheduling the instruction for dispatch.
摘要:
An instruction dispatch circuit is disclosed that improves instruction execution throughput for a processor. The instruction dispatch circuit comprises an instruction buffer with a plurality of instruction entries and a content addressable memory array having at least one cam entry corresponding to each instruction entry. Each cam entry stores at least one source tag for the corresponding instruction entry. The content addressable memory array matches to a result tag from an execution circuit over a result bus, wherein the execution circuit transfers the result tag over the result bus at least one clock cycle before transferring a corresponding result data value over the result bus. Each cam entry generates a cam match signal used to determine whether data dependent instruction are ready for dispatch.
摘要:
A method and circuitry for coordinating exceptions in a processor. The processor generates a result data value and an exception data value for each instruction wherein the exception data value specifies whether the corresponding instruction causes an exception. The processor commits the result data values to an architectural state of the processor in the sequential program order, and fetches an exception handler to processes the exception if the exception is indicated by one of the exception data values. The processor fetches an asynchronous event handler to processes an asynchronous event if the asynchronous event is detected while the result data values are committed to the architectural state of the processor.
摘要:
A method and apparatus for performing operations with a processor in a computer system. Load operations are performed by use of a dispatch pipeline and a memory execution pipeline. The dispatch pipeline dispatches the load operation for execution by the processor, while the memory execution pipeline controls the execution of the load operation to memory. The present invention reduces the latency involved in executing a load operation by coupling the execution of the two pipelines during execution of the load operation.