摘要:
A macropipelined microprocessor chip adheres to strict read and write ordering by sequentially buffering operands in queues during instruction decode, then removing the operands in order during instruction execution. Any instruction that requires additional access to memory inserts the requests into the queued sequence (in a specifier queue) such that read and write ordering is preserved. A specifier queue synchronization counter captures synchronization points to coordinate memory request operations among the autonomous instruction decode unit, instruction execution unit, and memory sub-system. The synchronization method does not restrict the benefit of overlapped execution in the pipelined. Another feature is treatment of a variable bit field operand type that does not restrict the location of operand data. Instruction execution flows in a pipelined processor having such an operand type are vastly different depending on whether operand data resides in registers or memory. Thus, an operand context queue (field queue) is used to simplify context-dependent execution flow and increase overlap. The field queue allows the instruction decode unit to issue instructions with variable bit field operands normally, sequentially identifying and fetching operands, and communicating the operand context that specifies register or memory residence across the pipeline boundaries to the autonomous execution unit. The mechanism creates opportunity for increasing the overlap of pipelined functions and greatly simplifies the splitting of execution flows.
摘要:
A system for evaluating the performance of a computer system having a processor that passes through a plurality of processor states during operation and an associated system memory includes an operating unit for receiving a request to monitor specific process states from a user. Firmware causes the processor to enter the desired processor state requested by the user. The hardware identifies the occurrence of the desired processor state. Information relating to the occurrence of the desired process state is accumulated the memory. The accumulated information is read from memory and a report is provided to the user.
摘要:
Pipelined CPUs achieve high-performance by fine tuning the pipe stages to execute typical instruction sequences. Atypical instruction sequences result in pipeline exceptions. The disclosed method provides graceful exception handling and recovery in a micropipelined memory interface. The use of a memory reference restart command latch allows an implementation that requires no additional logic for conditional writing of states pending exception checking. The exception handling hardware is minimized because instructions which cause exceptions are never re-executed, and exception handling microcode executes in-line with the normal microcode flow.
摘要:
A pipelined CPU executes instructions of variable length, and references memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queueing between units of the CPU to allow flexibility in instruction execution times. A branch prediction method employs a branch history table which records the taken vs. not-taken history of branch opcodes recently used, and uses an empirical aglorithm to predict which way the next occurrence of this branch will go, based upon the history table. The branch history table stores in each entry a number of bits for each branch address, each bits indicating "taken" or "not-taken" for one occurrence of the branch. The table is indexed by branch address. A register stores the empirical aglorithm, and upon occurrence of a branch its history is fetched from the table and used to select a location in the register containing a prediction for this particular pattern of branch history.
摘要:
A pipelined processor has an instruction unit for decoding instructions and pre-processing operands prior to instruction execution, and an execution unit for executing the decoded instructions. The pre-processing of operands includes changes to general purpose registers, and the changes are recorded in an RLOG queue having read and write pointers. Instruction context for the RLOG queue entries is maintained in a separate RLOG base queue. When decoding begins for a new instruction, the RLOG base queue is loaded with the RLOG write pointer to the first RLOG queue entry that would record a register change for that next instruction. Each time an operand is processed that changes a general purpose register, the value of the change is recorded in the entry pointed to by the RLOG queue write pointer, and the RLOG queue write pointer is advanced. When the execution unit retires an instruction, its entries in the RLOG queue are discarded by advancing the RLOG queue read pointer to the pointer read from the RLOG base queue, and the pointer read from the RLOG base queue is removed from the RLOG base queue. During an unwind process in response to an exception, a micro-control unit successively reads a register change from the RLOG queue, checks whether the RLOG queue is empty, restores the register, and advances the RLOG queue read pointer until the RLOG queue becomes empty, and then resets the RLOG queue and the RLOG base queue.
摘要:
A pipelined CPU employs separate microinstruction pipelines for the execution unit and memory management unit. Deadlocks can occur in a pipelined CPU when there is data dependency in two consecutive instructions. The later instruction may stall the pipeline if operands fetched by an earlier instruction are needed, but the earlier instruction is not producing the memory request for the operands because the pipeline is stalled; this results in a deadlock. Using separate micro-pipelines, the earlier instruction is advanced independently of the rest of the pipeline, in the case of a deadlock, so that the operands for the later instruction are provided and the deadlock is broken.
摘要:
A method of specifying the operands for a microcoded CPU employs a combination of a set of microinstruction routines for generic operand modes, along with hardware primitives for selecting various specific types of operand treatment. Decoding of a machine-level instruction produces an entry point for the microstore, selecting one of the set of generic operand modes. Also, decoding of the instruction produces control bits that are used directly to select the specific operand type or used by the hardware primitives. In this way, branching is avoided in the microinstruction sequences used for operand specifying, but yet the amount of microcode needed is a minimum.
摘要:
A data dependency scoreboard for a pipelined digital computer includes a source counter and a destination counter for each general purpose register (GPR). The source counter for each GPR is incremented each time that a specifier is decoded that specifies the use of the source counter's GPR as a source operand. The source counter is decremented each time that an execution unit reads a source operand from the source counter's GPR. The destination counter is incremented each time that a specifier is decoded that specifies the use of the counter's GPR as a destination operand. The destination counter is decremented each time that the execution unit writes to the destination counter's GPR. A data dependency conflict causing a complex specifier unit to stall occurs when operand processing requires a write to a GPR that has a source counter value greater than zero, and when operand processing requires a read of a GPR that has a destination counter value greater than zero. Source and destination counts from the data dependency scoreboard for a GPR referenced by a complex specifier being processed, for example, are pipelined through down counters in the complex specifier unit, and the counts are updated in the complex specifier unit as the execution unit reads source operands from the GPR and writes to the GPR.