Physical address proxy reuse management

    公开(公告)号:US12086063B2

    公开(公告)日:2024-09-10

    申请号:US17747556

    申请日:2022-05-18

    摘要: Each load/store queue entry holds a load/store physical address proxy (PAP) for use as a proxy for a load/store physical memory line address (PMLA). The load/store PAP comprises a set index and a way that uniquely identifies an L2 cache entry holding a memory line at the load/store PMLA when an L1 cache provides the load/store PAP during the load/store instruction execution. The microprocessor removes a line at a removal PMLA from an L2 entry, forms a removal PAP as a proxy for the removal PMLA that comprises a set index and a way, snoops the load/store queue with the removal PAP to determine whether the removal PAP is being used as a proxy for the removal PMLA, fills the removed entry with a line at a fill PMLA, and prevents the removal PAP from being used as a proxy for the removal PMLA and the fill PMLA concurrently.

    Unforwardable load instruction re-execution eligibility based on cache update by identified store instruction

    公开(公告)号:US12079126B2

    公开(公告)日:2024-09-03

    申请号:US17747703

    申请日:2022-05-18

    摘要: A microprocessor includes a cache memory, a store queue, and a load/store unit. Each entry of the store queue holds store data associated with a store instruction. The load/store unit, during execution of a load instruction, makes a determination that an entry of the store queue holds store data that includes some but not all bytes of load data requested by the load instruction, cancels execution of the load instruction in response to the determination, and writes to an entry of a structure from which the load instruction is subsequently issuable for re-execution an identifier of a store instruction that is older in program order than the load instruction and an indication that the load instruction is not eligible to re-execute until the identified older store instruction updates the cache memory with store data.

    Non-cacheable access handling in processor with virtually-tagged virtually-indexed data cache

    公开(公告)号:US12061555B1

    公开(公告)日:2024-08-13

    申请号:US18199784

    申请日:2023-05-19

    摘要: A load/store circuit performs a first lookup of a load virtual address in a virtually-indexed, virtually-tagged first-level data cache (VIVTFLDC) that misses and generates a fill request that causes translation of the load virtual address into a load physical address, receives a response that indicates the load physical address is in a non-cacheable memory region and is without data from the load physical address, allocates a VIVTFLDC data-less entry that includes an indication that the data-less entry is associated with a non-cacheable memory region, performs a second lookup of the load virtual address in the VIVTFLDC and determines the load virtual address hits on the data-less entry, determines from the hit data-less entry it is associated with a non-cacheable memory region, and generates a read request to read data from a processor bus at the load physical address rather than providing data from the hit data-less entry.

    METHOD AND APPARATUS FOR DESKEWING DIE TO DIE COMMUNICATION BETWEEN SYSTEM ON CHIP DEVICES

    公开(公告)号:US20240118726A1

    公开(公告)日:2024-04-11

    申请号:US18219505

    申请日:2023-07-07

    IPC分类号: G06F1/12 G06F1/10

    CPC分类号: G06F1/12 G06F1/10

    摘要: A die-to-die (D2D) interface between chiplets of a system on a chip (SoC) in which each of the chiplets are subdivided into slices. The D2D interface includes a transmission interface coupled between first and second chiplets, which includes a first transmission path for a first slice and a second transmission path for a second slice. The first chiplet includes receive circuitry which further includes a write interface and a read interface. The write interface stores data received from the first transmission path into a first FIFO using a first clock signal received via the first transmission path, and stores data received from the second transmission path into a second FIFO using a second clock signal received via the second transmission path. The read interface reads data stored in the first and second FIFOs using the first clock signal. The first and second transmission paths may be subject to different delays.

    MICROPROCESSOR INCLUDING A DECODE UNIT THAT PERFORMS PRE-EXECUTION OF LOAD CONSTANT MICRO-OPERATIONS

    公开(公告)号:US20240103864A1

    公开(公告)日:2024-03-28

    申请号:US17945492

    申请日:2022-09-15

    IPC分类号: G06F9/30 G06F9/38

    摘要: A microprocessor includes a decode unit that maps architectural instructions into micro-operations and dispatches them to a scheduler that issues them to execution units that execute them by reading source operands from a register file and writing execution results to the register file. An architectural instruction instructs the microprocessor to load a constant into an architectural destination register. The decode unit maps the architectural instruction into a load constant micro-operation (LCM) and writes the LCM constant directly to a register of the register file without dispatching the LCM to the scheduler, such that the LCM is not issued to the execution units. In the same clock cycle, the decode unit indicates the LCM constant is available for consumption, such that the LCM imposes zero execution latency on dependent micro-operations and dispatches to the scheduler micro-operations other than the LCM. The register file may include a decode unit-dedicated write port.

    Folded instruction fetch pipeline

    公开(公告)号:US11880685B2

    公开(公告)日:2024-01-23

    申请号:US17835352

    申请日:2022-06-08

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3802 G06F9/3806

    摘要: An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.

    Microprocessor that prevents same address load-load ordering violations

    公开(公告)号:US11841802B2

    公开(公告)日:2023-12-12

    申请号:US17747815

    申请日:2022-05-18

    IPC分类号: G06F12/0891 G06F9/38 G06F9/30

    摘要: A microprocessor prevents same address load-load ordering violations. Each load queue entry holds a load physical memory line address (PMLA) and an indication of whether a load instruction has completed execution. The microprocessor fills a line specified by a fill PMLA into a cache entry and snoops the load queue with the fill PMLA, either before the fill or in an atomic manner with the fill with respect to ability of the filled entry to be hit upon by any load instruction, to determine whether the fill PMLA matches load PMLAs in load queue entries associated with load instructions that have completed execution and there are other load instructions in the load queue that have not completed execution. The microprocessor, if the condition is true, flushes at least the other load instructions in the load queue that have not completed execution.