摘要:
In a computer system for use as a symetrical multiprocessor, a superscalar microprocessor apparatus allows dispatching and executing multi-cycle and complex instructions Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). Multiple execution pipes correspond to the instruction dispatch ports and the execution unit is a Fixed Point Unit (FXU) which contains three execution dataflow pipes (X, Y and Z) and one control pipe (R). The FXU logic then execute these instructions on the available FXU pipes. This results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.
摘要:
An embodiment of the invention is a processor for processing loop branch instructions. The processor includes an instruction unit for fetching and decoding instructions including at least one loop branch instruction. A branch prediction unit predicts target instructions to be fetched and decoded by the instruction unit in response to the loop branch instruction. An execution unit executes instructions from the instruction unit and maintains a counter indicating an iteration of a loop. The execution unit includes detection logic for detecting when the counter equals a threshold and notifies the branch prediction unit when the counter equals the threshold.
摘要:
Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.
摘要:
In a computer system, a method and apparatus for dispatching and executing multi-cycle and complex instructions. The method results in maximum performance for such without impacting other areas in the processor such as decode, grouping or dispatch units. This invention allows multi-cycle and complex instructions to be dispatched to one port but executed in multiple execution pipes without cracking the instruction and without limiting it to a single execution pipe. Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). The FXU logic then execute these instructions on the available FXU pipes. This method results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.
摘要:
Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.
摘要:
Threads of a computing environment are managed to improve system performance. Threads are migrated between processors to take advantage of single thread processing mode, when possible. As an example, inactive threads are migrated from one or more processors, potentially freeing-up one or more processors to execute an active thread. Active threads are migrated from one processor to another to transform multiple threading mode processors to single thread mode processors.
摘要:
A technique is provided for cache management of a cache. The processing circuit determines a miss count and a hit position field during a previous execution of an instruction requesting that a data element be stored in a cache. The miss count and the hit position field are stored for a data element corresponding to an instruction that requests storage of the data element. The processing circuit places the data element in a hierarchical order based on the miss count and/or the hit position field. The hit position field includes a hierarchical position related to the data element in the cache.
摘要:
Threads of a computing environment are managed to improve system performance. Threads are migrated between processors to take advantage of single thread processing mode, when possible. As an example, inactive threads are migrated from one or more processors, potentially freeing-up one or more processors to execute an active thread. Active threads are migrated from one processor to another to transform multiple threading mode processors to single thread mode processors.
摘要:
A method of implementing binary multiplication in a processing device includes obtaining a multiplicand and a multiplier from a storage device; in the event the multiplier is larger than a selected length, partitioning the multiplier into a plurality of multiplier subgroups; in the event the multiplicand is larger than a selected length, partitioning the multiplicand into a plurality of multiplicand subgroups and at least one of zeroing out of unused bits of the multiplicand subgroup and sign-extending a smaller portion of the multiplicand subgroup; establishing a plurality of multiplicand multiples based on at least one of a selected multiplicand subgroup of the plurality of multiplicand subgroups and the multiplicand; selecting one or more of the multiplicand multiples of the plurality of multiplicand multiples based on the each multiplier subgroup of the plurality of multiplier subgroups; and generating a first modular product based on the selected multiplicand multiples.
摘要:
A method for allowing a partial instruction to be executed in a fixed point unit pipeline during the instruction dispatch cycle creates a mask used to select which bits of the operands participate in a future logical operation of the fixed point unit back a cycle to the instruction dispatch stage of the fixed point unit. As an S/390 System improvement applicable to other computers, the mask is determined and created two cycles ahead of execution, or two cycles before the mask is actually used. Also, in the method used for moving the mask generation back by one cycle, mask generation overlaps the dispatch stage in the I-unit, and this provides a handshake between the I-unit and E-unit of the fixed point unit of the central processor unit of the computer system. The control setting selection process occurs in a predetermination cycle stage or e-1 (em1) stage for the mask generation and the register file read address. Speculative handshaking allows the E-1 stage to be created with no impact to the last stage of the I-Unit, such that no additional logic is needed and cycle time is not jeopardized. Also, the E-1 stage of an instruction overlaps with the execution stages of previous instructions.