Ultra-low precision floating-point fused multiply-accumulate unit

    公开(公告)号:US11455142B2

    公开(公告)日:2022-09-27

    申请号:US16432358

    申请日:2019-06-05

    IPC分类号: G06F7/544

    摘要: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

    Instruction fusion after register rename

    公开(公告)号:US11256509B2

    公开(公告)日:2022-02-22

    申请号:US15834413

    申请日:2017-12-07

    IPC分类号: G06F9/38 G06F9/30

    摘要: Embodiments of the present invention include methods, systems, and computer program products for implementing instruction fusion after register rename. A computer-implemented method includes receiving, by a processor, a plurality of instructions at an instruction pipeline. The processor can further performing a register rename within the instruction pipeline in response to the received plurality of instructions. The processor can further determine that two or more of the plurality of instructions can be fused after the register rename. The processor can further fuse the two or more instructions that can be fused based on the determination to create one or more fused instructions. The processor can further perform an execution stage within the instruction pipeline to execute the plurality of instructions, including the one or more fused instructions.

    Coalescing global completion table entries in an out-of-order processor

    公开(公告)号:US11204772B2

    公开(公告)日:2021-12-21

    申请号:US16738360

    申请日:2020-01-09

    IPC分类号: G06F9/30 G06F9/38

    摘要: Aspects of the invention include detecting that all instructions in a first group of in-flight instructions have a status of finished. The first group of in-flight instructions is associated with a first allocated entry in a global completion table (GCT) which tracks a dispatch order and status of groups of in-flight instructions. The GCT includes a plurality of allocated entries including the first allocated entry and a second allocated entry. A second group of in-flight instructions dispatched immediately prior to the first group is associated with the second allocated entry in the GCT. Based at least in part on the detecting, the first allocated entry is merged into the second allocated entry to create a single merged second allocated entry in the GCT that includes completion information for both the first group of in-flight instructions and the second group of in-flight instructions. The first allocated entry is then deallocated.

    Load-store unit with partitioned reorder queues with single cam port

    公开(公告)号:US11175925B2

    公开(公告)日:2021-11-16

    申请号:US15825453

    申请日:2017-11-29

    IPC分类号: G06F9/38

    摘要: Technical solutions are described for a load-store unit (LSU) that executes a plurality of instructions in an out-of-order (OoO) window using multiple LSU pipes. The execution includes selecting an instruction from the OoO window, the instruction using an effective address; and if the instruction is a load instruction: and if the processing unit is operating in single thread mode, creating an entry in a first partition of a load reorder queue (LRQ) if the instruction is issued on a first load pipe, and creating the entry in a second partition of the LRQ if the instruction is issued on a second load pipe. Further, if the processing unit is operating in a multi-thread mode, creating the entry in a first predetermined portion of the first partition of the LRQ if the instruction is issued on the first load pipe and by a first thread of the processing unit.

    Block based allocation and deallocation of issue queue entries

    公开(公告)号:US10922087B2

    公开(公告)日:2021-02-16

    申请号:US15826740

    申请日:2017-11-30

    IPC分类号: G06F9/30 G06F9/38

    摘要: Aspects of the invention include tracking relative ages of instructions in an issue queue of an OoO processor. The tracking includes grouping entries in the issue queue into a pool of blocks, each block containing two or more entries that are configured to be allocated and deallocated as a single unit, each entry configured to store an instruction. Blocks are selected in any order from the pool of block for allocation. The selected blocks are allocated and the relative ages of the allocated blocks are tracked based at least in part on an order that the blocks are allocated. Each allocated block is configured as a first-in-first-out (FIFO) queue of entries, configured to add instructions to the block in a sequential order, and configured to remove instructions from the block in any order including a non-sequential order. The relative ages of instructions within each allocated block are tracked.