Abstract:
A method and system for operating a high frequency outprocessor with increased pipeline length. A new scheme is disclosed to reduce the pipeline by the detection and exploitation of so called nullno_dependencynull for an instruction. A nullno dependencynull signal tells that all required source data is available for the instruction at least one cycle before the source data valid bit(s) are inserted into the issue queue. Therefore, one or more stages of the pipeline are bypassed. Bypassing the pipeline stages for this nullno dependencynull conditions is especially important since a no dependency is found when the queue is empty. Furthermore, this bypass is very effective when the queue is relatively empty. Therefore, introducing such a bypass reduces effectively the performance drawback of a longer pipeline.
Abstract:
A method for operating a processor having an architecture of a larger bitlength with a program comprising instructions compiled to produce instruction results of at least one smaller bitlength having the steps of detecting when in program order a first smaller bitlength instruction is to be dispatched which does not have a target register address as one of its sources, and adding a so_extract_instruction into an instruction stream before the smaller bitlength instruction. The extract instruction includes the steps of dispatching the extract instruction together with the following smaller bitlength instruction from an instruction queue into a Reservation Station, issuing the extract instruction to an Instruction Execution Unit (IEU) as soon as all source operand data is available and an IEU is available according to respective issue scheme, executing the extract instruction by an available IEU, setting an indication that the result of the instruction needs to be written into the result field of the instruction following the extract instruction, and writing the extract instruction result into the result field of the first instruction, and into all fields of operands being dependent of the first instruction.
Abstract:
Several local clock buffers are disclosed, each including an input section and an output section. The input sections are substantially identical, and include control logic and gating logic. The control logic produces a gating signal dependent upon multiple control signals and a time-delayed global clock signal. The gating logic produces an intermediate clock signal dependent upon the global clock signal and the gating signal. The output section produces at least one local clock signal dependent upon the intermediate clock signal. In one embodiment, the output section produces a first local clock signal dependent upon the intermediate clock signal and a second local clock signal dependent upon the first local clock signal. In another embodiment, the gating logic produces the intermediate clock signal dependent upon the global clock and gating signals and a feedback signal. The output section produces the feedback signal and one or more local clock signals.
Abstract:
An improved method and system for operating an out of order processor at a high frequency enabled by an increased pipeline length. It is proposed to shorten the pipeline by a considerable number of stages by accepting that a write after read conflict may occur, when directly after renaming, during the nullread ROBnull pipeline stage, all the information (tag, validity and data) is read from an Reorder Buffer ROB entry, and is next written, in a following pipeline stage nullwrite RSnull, into a reservation station (RS) entry. In order to assure the correctness of processing in particular in cases of dependencies, e.g., write after read conflicts a separate inventional add in logic covers these cases. The logic detects the write after read conflict case of an Instructional Execution Unit (IEU) writing into the particular entry that is selected by the renaming logic during nullread ROBnull. Then, a separate issue process selects the entries for which a conflict is reported and writes the data into the respective entry of the RS. This increases performance because those conflict cases are rather seldom compared to the broad majority of instructions to be found in a statistically determined average instruction flow.