PRESERVING MEMORY ORDERING BETWEEN OFFLOADED INSTRUCTIONS AND NON-OFFLOADED INSTRUCTIONS

    公开(公告)号:US20230244492A1

    公开(公告)日:2023-08-03

    申请号:US18298723

    申请日:2023-04-11

    CPC classification number: G06F9/3836 G06F9/3001 G06F9/522 G06F9/3877

    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

    PRESERVING MEMORY ORDERING BETWEEN OFFLOADED INSTRUCTIONS AND NON-OFFLOADED INSTRUCTIONS

    公开(公告)号:US20220206817A1

    公开(公告)日:2022-06-30

    申请号:US17137140

    申请日:2020-12-29

    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

    MANAGING CACHED DATA USED BY PROCESSING-IN-MEMORY INSTRUCTIONS

    公开(公告)号:US20220188233A1

    公开(公告)日:2022-06-16

    申请号:US17473242

    申请日:2021-09-13

    Abstract: A system-on-chip configured for eager invalidation and flushing of cached data used by PIM (Processing-in-Memory) instructions includes: one or more processor cores; one or more caches and an I/O (input/output) die comprising logic to: receive a cache probe request, wherein the cache probe request including a physical memory address associated with a PIM instruction, and the PIM instruction is to be offloaded to a PIM device for execution; and issue, based on the physical memory address, a cache probe to one or more of the caches prior to receiving the PIM instruction for dispatch to the PIM device.

    OFFLOADING COMPUTATIONS FROM A PROCESSOR TO REMOTE EXECUTION LOGIC

    公开(公告)号:US20220206855A1

    公开(公告)日:2022-06-30

    申请号:US17136767

    申请日:2020-12-29

    Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.

    PUSHED PREFETCHING IN A MEMORY HIERARCHY
    10.
    发明公开

    公开(公告)号:US20240111678A1

    公开(公告)日:2024-04-04

    申请号:US17958120

    申请日:2022-09-30

    CPC classification number: G06F12/0862 G06F12/0811

    Abstract: Systems and methods for pushed prefetching include: multiple core complexes, each core complex having multiple cores and multiple caches, the multiple caches configured in a memory hierarchy with multiple levels; an interconnect device coupling the core complexes to each other and coupling the core complexes to shared memory, the shared memory at a lower level of the memory hierarchy than the multiple caches; and a push-based prefetcher having logic to: monitor memory traffic between caches of a first level of the memory hierarchy and the shared memory; and based on the monitoring, initiate a prefetch of data to a cache of the first level of the memory hierarchy.

Patent Agency Ranking