摘要:
A processing system and method includes a predecoder configured to identify instructions that are combinable to form a single executable internal instruction. Instruction storage is configured to merge instructions that are combinable. An instruction execution unit is configured to execute the single, executable internal instruction on a hardware wide datapath.
摘要:
A method of performing instructions in a computer processor architecture includes determining that a load instruction is being dispatched. Destination related data of the load instruction is written into a mapper of the architecture. A determination that a compare immediate instruction is being dispatched is made. A determination that a branch conditional instruction is being dispatched is made. The branch conditional instruction is configured to wait until the load instruction produces a result before the branch conditional instruction issues and executes. The branch conditional instruction skips waiting for a finish of the compare immediate instruction.
摘要:
A method of performing instructions in a computer processor architecture includes determining that a load instruction is being dispatched. Destination related data of the load instruction is written into a mapper of the architecture. A determination that a compare immediate instruction is being dispatched is made. A determination that a branch conditional instruction is being dispatched is made. The branch conditional instruction is configured to wait until the load instruction produces a result before the branch conditional instruction issues and executes. The branch conditional instruction skips waiting for a finish of the compare immediate instruction.
摘要:
A computer processor includes a dispatch stage and a dependency skipping execution unit. The dispatch stage is configured to dispatch a plurality of instructions that include a general purpose instruction configured to produce first data, a dependent instruction configured to produce second data, and an indirect dependent instruction configured to produce third data. The dependency skipping execution unit is configured to monitor the plurality of instructions and to process the indirect dependent instruction in response to the general purpose instruction producing the first data. The indirect dependent instruction is issued independently from the second data produced by the indirect dependent instruction.
摘要:
Aspects of the invention include includes determining a first instruction in a processing pipeline, wherein the first instruction includes a compare instruction, determining a second instruction in the processing pipeline, wherein the second instruction includes a conditional branch instruction relying on the compare instruction, determining a predicted result of the compare instruction, and completing the conditional branch instruction using the predicted result prior to executing the compare instruction.
摘要:
Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.
摘要:
Embodiments of the present invention include methods, systems, and computer program products for implementing instruction fusion after register rename. A computer-implemented method includes receiving, by a processor, a plurality of instructions at an instruction pipeline. The processor can further performing a register rename within the instruction pipeline in response to the received plurality of instructions. The processor can further determine that two or more of the plurality of instructions can be fused after the register rename. The processor can further fuse the two or more instructions that can be fused based on the determination to create one or more fused instructions. The processor can further perform an execution stage within the instruction pipeline to execute the plurality of instructions, including the one or more fused instructions.
摘要:
Aspects of the invention include detecting that all instructions in a first group of in-flight instructions have a status of finished. The first group of in-flight instructions is associated with a first allocated entry in a global completion table (GCT) which tracks a dispatch order and status of groups of in-flight instructions. The GCT includes a plurality of allocated entries including the first allocated entry and a second allocated entry. A second group of in-flight instructions dispatched immediately prior to the first group is associated with the second allocated entry in the GCT. Based at least in part on the detecting, the first allocated entry is merged into the second allocated entry to create a single merged second allocated entry in the GCT that includes completion information for both the first group of in-flight instructions and the second group of in-flight instructions. The first allocated entry is then deallocated.
摘要:
Technical solutions are described for a load-store unit (LSU) that executes a plurality of instructions in an out-of-order (OoO) window using multiple LSU pipes. The execution includes selecting an instruction from the OoO window, the instruction using an effective address; and if the instruction is a load instruction: and if the processing unit is operating in single thread mode, creating an entry in a first partition of a load reorder queue (LRQ) if the instruction is issued on a first load pipe, and creating the entry in a second partition of the LRQ if the instruction is issued on a second load pipe. Further, if the processing unit is operating in a multi-thread mode, creating the entry in a first predetermined portion of the first partition of the LRQ if the instruction is issued on the first load pipe and by a first thread of the processing unit.
摘要:
Aspects of the invention include tracking relative ages of instructions in an issue queue of an OoO processor. The tracking includes grouping entries in the issue queue into a pool of blocks, each block containing two or more entries that are configured to be allocated and deallocated as a single unit, each entry configured to store an instruction. Blocks are selected in any order from the pool of block for allocation. The selected blocks are allocated and the relative ages of the allocated blocks are tracked based at least in part on an order that the blocks are allocated. Each allocated block is configured as a first-in-first-out (FIFO) queue of entries, configured to add instructions to the block in a sequential order, and configured to remove instructions from the block in any order including a non-sequential order. The relative ages of instructions within each allocated block are tracked.