摘要:
A microprocessor includes a branch target address cache (BTAC), each entry thereof configured to store branch prediction information for at most N branch instructions. An execution unit executes a branch instruction previously fetched in a fetch quantum. Update logic determines whether the BTAC is already storing information for N branch instructions within the fetch quantum (N is at least two), updates the BTAC for the branch instruction if the BTAC is not already storing information for N branch instructions, determines whether a type of the branch instruction has a higher replacement priority than a type of the N branch instructions if the BTAC is already storing information for N branch instructions, and updates the BTAC for the branch instruction if the type of the branch instruction has a higher replacement priority than the type of the N branch instructions already stored in the BTAC.
摘要:
An apparatus extracts instructions from a stream of undifferentiated instruction bytes in a microprocessor having an instruction set architecture in which the instructions are variable length. Decoders generate an associated start/end mark for each instruction byte of a line from a first queue of entries each storing a line of instruction bytes. A second queue has entries each storing a line received from the first queue along with the associated start/end marks. Control logic detects a condition where the length of an instruction whose initial portion within a first line in the first queue is yet undeterminable because the instruction's remainder resides in a second line yet to be loaded into the first queue from the instruction cache; loads the first line and corresponding start/end marks into the second queue and refrains from shifting the first line out of the first queue, in response to detecting the condition; and extracts instructions from the first line in the second queue based on the corresponding start/end marks. The instructions exclude the yet undeterminable length instruction.
摘要:
A branch control apparatus in a microprocessor. The branch control apparatus includes an instruction buffer having a plurality of stages that buffer cache lines of instruction bytes received from an instruction cache. A multiplexer selects one of the bottom three stages in the instruction buffer to provide to instruction format logic. The multiplexer selects a stage based on a branch indicator, an instruction wrap indicator, and a carry indicator. The branch indicator indicates whether the processor previously branched to a target address provided by a branch target address cache. The branch indicator and target address are previously stored in association with the stage containing the branch instruction for which the target address is cached. The wrap indicator indicates whether the currently formatted instruction wraps across two cache lines. The carry indicator indicates whether the current instruction being formatted occupies the last byte of the currently formatted instruction buffer stage.
摘要:
An apparatus in a microprocessor that has an instruction set architecture in which instructions may include a length-modifying prefix used to select an address/operand size other than a default address/operand size, wherein the apparatus marks the start byte and the end byte of each instruction in a stream of instruction bytes. Decode logic decodes each instruction byte of a predetermined number of instruction bytes to determine whether the instruction byte specifies a length-modifying prefix and generates a start mark and an end mark for each of the instruction bytes based on an address/operand size. Operand/address size logic provides the default operand/address size to the decode logic to use to generate the start and end marks during a first clock cycle during which the decode logic decodes the predetermined number of instruction bytes. If during the first clock cycle and any of N subsequent clock cycles the decode logic indicates that one of the predetermined number of instruction bytes specifies a length-modifying prefix, the operand/address size logic provides to the decode logic on the next clock cycle the address/operand size specified by the length-modifying prefix to use to generate the start and end marks.
摘要:
An out-of-order execution in-order retire microprocessor includes a branch information table comprising N entries. Each of the N entries stores information associated with a branch instruction. The microprocessor also includes a reorder buffer comprising M entries. Each of the M entries stores information associated with an unretired instruction within the microprocessor. Each of the M entries includes a field that indicates whether the unretired instruction is a branch instruction and, if so, a tag identifying one of the N entries in the branch information table storing information associated with the branch instruction. N is significantly less than M such that the overall die space and power consumption is reduced over a processor in which each reorder buffer entry stores the branch information.
摘要:
An out-of-order execution in-order retire microprocessor includes a branch information table comprising N entries. Each of the N entries stores information associated with a branch instruction. The microprocessor also includes a reorder buffer comprising M entries. Each of the M entries stores information associated with an unretired instruction within the microprocessor. Each of the M entries includes a field that indicates whether the unretired instruction is a branch instruction and, if so, a tag identifying one of the N entries in the branch information table storing information associated with the branch instruction. N is significantly less than M such that the overall die space and power consumption is reduced over a processor in which each reorder buffer entry stores the branch information.
摘要:
In a microprocessor that has an instruction set architecture in which the instructions may include a variable number of prefix bytes, an apparatus for efficiently extracting instructions from a stream of undifferentiated instruction bytes. Decode logic determines which byte is an opcode byte for each instruction of a plurality of instructions within the stream of undifferentiated instruction bytes. The opcode byte is the first non-prefix byte of the instruction. The decode logic accumulates prefix information onto the opcode byte of the instruction for each instruction of the plurality of instructions. A queue holds the stream of undifferentiated instruction bytes and the accumulated prefix information. Extraction logic extracts the plurality of instructions from the queue in one clock cycle independent of the number of prefix bytes included in each of the plurality of instructions.
摘要:
An apparatus efficiently determines the length of an instruction within a stream of instruction bytes processed by a microprocessor having a variable instruction length instruction set architecture. The apparatus includes combinatorial logic associated with each instruction byte of the stream, each configured to receive the associated instruction byte and the next instruction byte of the stream and to generate in response thereto a first length, a second length, and a select control. A multiplexor associated with each of the combinatorial logic selects and outputs one of the following inputs based on the select control received from the combinatorial logic: a zero input and the second length received from the combinatorial logic associated with each of the next three instruction bytes of the stream. An adder associated with each of the combinatorial logic and multiplexor adds the first length and the output of the multiplexor to generate the length of the instruction.
摘要:
A microprocessor for predicting return instruction target addresses is disclosed. A branch target address cache stores a plurality of target address predictions and a corresponding plurality of override indicators for a corresponding plurality of return instructions, and provides a prediction of the target address of the return instruction from the target address predictions and provides a corresponding override indicator from the override indicators. Each has a true value when the return stack has mispredicted the target address of the corresponding return instruction for a most recent execution of the return instruction. A return stack also provides a prediction of the target address of the return instruction. Branch control logic causes the microprocessor to branch to the prediction of the target address provided by the BTAC, and not to the prediction of the target address provided by the return stack, when the override indicator is a true value.
摘要:
A branch prediction apparatus that employs dual call/return stacks to predict return addresses in a microprocessor. The apparatus includes a first call/return stack that provides a speculative return address based upon a return instruction hit in a speculative branch target address cache (BTAC) of an instruction cache fetch address prior to decoding of the instruction to know whether it is actually a return instruction. The speculative return address is one of multiple return addresses simultaneously stored in the first call/return stack each pushed thereupon in response to the BTAC indicating a call instruction was fetched and prior to decoding the call instruction. The speculative return address is provided early in the pipeline and the microprocessor speculatively branches to the speculative return address. Later in the pipeline, a second call/return stack provides a non-speculative return address after the instruction is decoded and verified to be a return instruction. A comparator compares the speculative and non-speculative return addresses, and if the two addresses mismatch, the microprocessor branches to the non-speculative return address.