摘要:
An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.
摘要:
An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.
摘要:
A predecode unit is configured to predecode variable byte-length instructions prior to their storage within an instruction cache of a superscalar microprocessor. The predecode unit generates three predecode bits associated with each byte of instruction code: a "start" bit, an "end" bit, and a "functional" bit. The start bit is set if the associated byte is the first byte of the instruction. Similarly, the end bit is set if the byte is the last byte of the instruction. The functional bits convey information regarding the location of an opcode byte for a particular instruction as well as an indication of whether the instruction can be decoded directly by the decode logic of the processor or whether the instruction is executed by invoking a microcode procedure controlled by an MROM unit. For fast path instructions, the functional bit is set for each prefix byte included in the instruction, and cleared for other bytes. For MROM instructions, the functional bit is cleared for each prefix byte and is set for other bytes. The type of instruction (either fast path or MROM) may thus be determined by examining the functional bit corresponding to the end byte of the instruction. If that functional bit is clear, the instruction is a fast path instruction. Conversely, if that functional bit is set, the instruction is an NMOM instruction. After an MROM instruction is identified, the functional bits for the instruction may be inverted. Subsequently, the opcode for both fast path and MROM instructions may readily be located (by the alignment logic) by determining the first byte within the instruction that has a cleared functional bit.
摘要:
A branch prediction unit stores a set of branch prediction history bits and branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. While only one bit is used to represent branch prediction history, three distinct states are represented in conjunction with the absence of a branch prediction. This provides for the storage of fewer bits, while maintaining a high degree of branch prediction accuracy. Each branch selector identifies the branch prediction to be selected if a fetch address corresponding to that branch selector is presented. In order to minimize the number of branch selectors stored for a group of contiguous instruction bytes, the group is divided into multiple byte ranges. The largest byte range may include a number of bytes comprising the shortest branch instruction in the instruction set (exclusive of the return instruction). For example, the shortest branch instruction may be two bytes in one embodiment. Therefore, the largest byte range is two bytes in the example. Since the branch selectors as a group change value (i.e. indicate a different branch instruction) only at the end byte of a predicted-taken branch instruction, fewer branch selectors may be stored than the number of bytes within the group.
摘要:
A branch prediction unit includes a branch prediction entry corresponding to a group of contiguous instruction bytes. The branch prediction entry stores branch predictions corresponding to branch instructions within the group of contiguous instruction bytes. Additionally, the branch prediction entry stores a set of branch selectors corresponding to the group of contiguous instruction bytes. The branch selectors identify which branch prediction is to be selected if the corresponding byte (or bytes) is selected by the offset portion of the fetch address. Still further, a predicted branch selector is stored. The predicted branch selector is used to select a branch prediction for forming the fetch address. In parallel, a selected branch selector is selected from the set of branch selectors. The predicted branch selector is verified using the selected branch selector. If the selected branch selector and the predicted branch selector mismatch, the correct branch prediction is generated and the predicted branch selector is updated to indicate the selected branch selector.
摘要:
A device and method for comparing cancel tags, and for canceling data from a finite wrap-around data buffer. The data buffer stores tag values that are continuous, or sequential. A cancel tag is used to cancel all tags with a value "greater-than" the cancel tag. In comparing cancel tags of a wrap-around buffer, however, the comparator must take into account wrap-around conditions. When a wrap-around condition occurs, tags that have a lower value may be "greater-than" the cancel tag. The present invention advantageously adds an additional bit to the tags stored in the data buffer and the cancel tag. The additional bit is toggled whenever a wrap-around condition occurs. By comparing the additional bit of the tag to the additional bit of the cancel tag, a wrap-around condition can be detected without extensive additional circuitry. The comparison of the additional bit indicates whether the comparator should cancel tags that are greater-than or less-than the cancel tag. The cancel tag causes the buffer pointer to change state and point to the storage element associated with the cancel tag, and causes the tag generator to change state.
摘要:
A method for optimizing loop control of microcoded instructions includes identifying an instruction as a repetitive microcode instruction such as a move string instruction, for example, having a repeat prefix. The repetitive microcode instruction may include a loop of microcode instructions forming a microcode sequence. The microcode sequence is stored within a storage of a microcode unit. The method also includes storing a loop count value associated with the repetitive microcode instruction to a sequence control unit of the microcode unit. The method further includes determining a number of iterations to issue the microcode sequence for execution by an instruction pipeline based upon the loop count value. In response to receiving the repetitive microcode instruction, the method includes continuously issuing the microcode sequence for the number of iterations.
摘要:
A branch prediction unit stores as set of branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. Each branch selector identifies a branch prediction to be selected if a fetch address corresponding to that branch selector is presented. The branch prediction unit additionally stores a set of return selectors corresponding to one or more branch predictions. The return selectors identify the type of branch selection. For example, the branch predictions may include a sequential branch prediction and a branch instruction branch prediction. The return selectors may identify whether the branch instruction branch prediction is associated with the return instruction or a non-return branch instruction.
摘要:
An adder circuit in parallel with a zero flag generation circuit. In a preferred embodiment, an arithmetic logic unit (ALU) circuit in a microprocessor based computer system includes an adder circuit preferably adapted to receive first and second operands. The preferred adder circuit is further adapted to produce a result equal to the sum of the first and second operands. The ALU circuit further includes a zero flag generation circuit. The zero flag generation circuit is adapted to receive the first and second operands in parallel with the adder circuit and to produce a zero flag signal in response to the operands. The zero flag signal is indicative of whether the sum of the operands is equal to zero. In one embodiment, the zero flag generation circuit includes N half adders in parallel wherein each adder receives a bit from the first operand and a corresponding bit from the second operand. Each half adder produces a sum bit and a carry bit in response to the inputs. Preferably, the zero flag generation circuit further includes N-1 Exclusive OR (EXOR) gates. Each of the N-1 EXOR gates receives one bit of the N sum bits and a corresponding bit of the N carry bits as inputs. The N-1 outputs from the EXOR gates, together with an inverted least significant sum bit, are routed to a logic circuit. The logic circuit functions as an AND gate, producing an output signal indicative of whether each input signal is equal to 1.
摘要:
A data processor (200) includes an instruction cache (220) and a secondary cache (250). The instruction cache (220) has a plurality of cache lines. Each of the plurality of cache lines stores a first plurality of bits (222) corresponding to at least one instruction and a second plurality of bits (224, 226) associated with the execution of the at least one instruction. The secondary cache (250) is coupled to the instruction cache (220) and stores cache lines from the instruction cache (250) by storing the first plurality of bits (222) and a third plurality of bits (255, 257) corresponding to the second plurality of bits (224, 226). The third plurality of bits (255, 257) is fewer in number than the second plurality of bits (224, 226).