摘要:
A Branch Target Buffer Circuit in a computer processor that predicts branch instructions with a stream of computer instructions is disclosed. The Branch Target Buffer Circuit uses a Branch Target Buffer Cache that stores branch information about previously executed branch instructions. The branch information stored in the Branch Target Buffer Cache is addressed by the last byte of each branch instruction. When an Instruction Fetch Unit in the computer processor fetches a block of instructions it sends the Branch Target Buffer Circuit an instruction pointer. Based on the instruction pointer, the Branch Target Buffer Circuit looks in the Branch Target Buffer Cache to see if any of the instructions in the block being fetched is a branch instruction. When the Branch Target Buffer Circuit finds an upcoming branch instruction in the Branch Target Buffer Cache, the Branch Target Buffer Circuit informs an Instruction Fetch Unit about the upcoming branch instruction.
摘要:
A method of state recovery following a branch misprediction or an undetected branch instruction. If, during execution of a branch instruction in an out-of-order unit, it is determined that the branch has been mispredicted, or if a taken branch has not been detected, then a JEClear signal is asserted to flush the instruction fetch unit and decoder section, and to change the instruction pointer to the actual target address. Within the out-of-order section, the instructions preceding the branch instruction are allowed to continue execution and proceed to in-order retirement. Simultaneously, instructions fetched at the actual target address are decoded, but not allowed to issue therefrom until the branch instruction has been retired from the out-of-order section, after which all instructions within the out-of-order section are flushed, and then decoded instructions are allowed to issue from the decoder. The state recovery method advantageously provides efficient utilization of processor time.
摘要:
A four stage branch instruction resolution system for a pipelined processor is disclosed. A first stage of the branch instruction resolution system predicts the existence and outcome of branch instructions within an instruction stream such that an instruction fetch unit can continually fetch instructions. A second stage decodes all the instructions fetched. If the decode stage determines that a branch instruction predicted by the first stage is not a branch instruction, the decode stage flushes the pipeline and restarts the processor at a corrected address. The decode stage verifies all branch predictions made by the branch prediction stage. Finally, the decode stage makes branch predictions for branches not predicted by the branch prediction stage. A third stage executes all the branch instructions to determine a final branch outcome and a final branch target address. The branch execution stage compares the final branch outcome and final branch target address with the predicted branch outcome and predicted branch target address to determine if the processor must flush the front-end of the microprocessor pipeline and restart at a corrected address. A final branch resolution stage retires all branch instructions. The retirement stage ensures that any instructions fetched after a mispredicted branch are not committed into permanent state.
摘要:
A method and apparatus for performing error correction on data from an external memory is described. The present invention includes a method and apparatus for receiving data from an external memory source and determining if the data has an error. The data is forwarded to the requesting unit while the error correction is performed on the data, such that the two operations are performed in parallel.The present invention also includes a method and apparatus for subsequently correcting data if a single bit error exists. The corrected data is then forwarded to the requesting unit during the next cycle. Also if an error is detected, the present invention produces an indication to the device. The device is flushed in response to the indication.
摘要:
A microprocessor for providing an extended linear address of more than 32 bits. The extended linear address may be provided by concatenating a linear address with a segment selector extension, or by concatenating the values from two registers. Hierarchical translation of a linear address to a physical address is performed in which the number of levels in the hierarchy depends upon whether the linear address is an extended linear address.
摘要:
A parallel processor has a plurality of communication buses advantageously interconnecting the arithmetic processor elements, the memory controller elements, a global controller circuitry, and input/output processors. The processor preferably has at least one central processing unit cluster, the cluster having at least one integer processor and one floating point processor. A plurality of I/F buses interconnect the integer and floating point processors of a cluster for communications therebetween. Integer load buses connect the integer processors of each cluster and selectively connect those processors to the memory controllers for transferring data from memory to the clusters and for providing inter-integer processor data communications. A plurality of floating point load buses connect the floating point processors of the clusters to selected memory controllers for transferring data from the controllers to the floating point processors and for providing inter-floating point processor data communications. A plurality of physical address buses provide one-way communications for transferring memory addresses from the integer processors to the memories and a plurality of storage buses connect the floating point processors to the memory controllers along a one-way communications path for transferring data to be stored in the memories. The hardware architecture provides advantageous communications between the elements of the data processing system for enabling wide bandwidth communications and high instruction throughput.
摘要:
A method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having a length in bits at least equal to the number of instruction fields in the instruction word. Each instruction field is associated with a bit of the mask word and accordingly, using the mask word, only non-zero instruction fields need to be stored in memory. The instruction compaction method is advantageously used in a high speed cache miss engine for refilling portions of instruction cache after a cache miss occurs.
摘要:
A method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having a length in bits at least equal to the number of instruction fields in the instruction word. Each instruction field is associated with a bit of the mask word and accordingly, using the mask word, only non-zero instruction fields need to be stored in memory. The instruction compaction method is advantageously used in a high speed cache miss engine for refilling portions of instruction cache after a cache miss occurs.
摘要:
A data processor has a central processing unit and at least one pipelined memory controller circuitry. The central processing unit addresses data in the memory using a virtual address memory table lookaside buffer and features a data miss recovery circuitry wherein, after a memory access error condition has been detected, the instruction causing the error condition, and those instructions entering the memory pipeline after the instruction causing the error condition, are replayed. The method and apparatus for replaying the instructions use first in-first out buffers for storing the virtual address data and instruction status data relating to each memory access instruction. That stored data is then retrieved after an error condition is detected so that the instruction sequence, beginning at the data miss, can be replayed.
摘要:
In a parallel data processing system having a plurality of separately operating arithmetic processing units, a method and apparatus allows a plurality of branch instructions to be operated upon in a single machine cycle. The branch instructions have associated therewith a hierarchical priority system and the method and apparatus determine which branch, if any, should be taken. In particular, the method and apparatus simultaneously determine, during the parallel execution of the branch instructions, whether any branch test condition associated with a branch instruction is true, and independently, the target address for each branch instruction and a fall-through instruction address if a branch instruction is not taken.