-
1.
公开(公告)号:US20230244492A1
公开(公告)日:2023-08-03
申请号:US18298723
申请日:2023-04-11
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAGADISH B. KOTRA , JOHN KALAMATIANOS
CPC classification number: G06F9/3836 , G06F9/3001 , G06F9/522 , G06F9/3877
Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
-
公开(公告)号:US20220206817A1
公开(公告)日:2022-06-30
申请号:US17137140
申请日:2020-12-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAGADISH B. KOTRA , JOHN KALAMATIANOS
Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
-
公开(公告)号:US20220188117A1
公开(公告)日:2022-06-16
申请号:US17123270
申请日:2020-12-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JOHN KALAMATIANOS , MICHAEL T. CLARK , MARIUS EVERS , WILLIAM L. WALKER , PAUL MOYER , JAY FLEISCHMAN , JAGADISH B. KOTRA
Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
-
公开(公告)号:US20220188233A1
公开(公告)日:2022-06-16
申请号:US17473242
申请日:2021-09-13
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JOHN KALAMATIANOS , JAGADISH B. KOTRA , GAGANDEEP PANWAR
IPC: G06F12/0817
Abstract: A system-on-chip configured for eager invalidation and flushing of cached data used by PIM (Processing-in-Memory) instructions includes: one or more processor cores; one or more caches and an I/O (input/output) die comprising logic to: receive a cache probe request, wherein the cache probe request including a physical memory address associated with a PIM instruction, and the PIM instruction is to be offloaded to a PIM device for execution; and issue, based on the physical memory address, a cache probe to one or more of the caches prior to receiving the PIM instruction for dispatch to the PIM device.
-
公开(公告)号:US20220027291A1
公开(公告)日:2022-01-27
申请号:US16938364
申请日:2020-07-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: SERGEY BLAGODUROV , JOHNATHAN ALSOP , JAGADISH B. KOTRA , MARKO SCRBAK , GANESH DASIKA
IPC: G06F13/16 , G06F9/30 , H04L12/733
Abstract: Arbitrating atomic memory operations, including: receiving, by a media controller, a plurality of atomic memory operations; determining, by an atomics controller associated with the media controller, based on one or more arbitration rules, an ordering for issuing the plurality of atomic memory operations; and issuing the plurality of atomic memory operations to a memory module according to the ordering.
-
公开(公告)号:US20240004786A1
公开(公告)日:2024-01-04
申请号:US17855157
申请日:2022-06-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: VIGNESH ADHINARAYANAN , MAHZABEEN ISLAM , JAGADISH B. KOTRA , SERGEY BLAGODUROV
IPC: G06F3/06
CPC classification number: G06F3/0647 , G06F3/0604 , G06F3/0683
Abstract: Allocating memory for processing-in-memory (PIM) devices, including: allocating, in a first Dynamic Random Access Memory (DRAM) sub-array, a first data structure beginning in a first grain of the DRAM; allocating, in a second DRAM sub-array, a second data structure beginning in a second grain of the DRAM; and wherein the second DRAM sub-array is different from the first DRAM sub-array and the second grain is different from the first grain.
-
公开(公告)号:US20220206855A1
公开(公告)日:2022-06-30
申请号:US17136767
申请日:2020-12-29
Applicant: ADVANCED MICRO DEVICES, INC.
IPC: G06F9/50
Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.
-
公开(公告)号:US20210327494A1
公开(公告)日:2021-10-21
申请号:US17025157
申请日:2020-09-18
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAGADISH B. KOTRA
IPC: G11C11/406 , G11C11/408 , G11C11/409
Abstract: Hardware-assisted Dynamic Random Access Memory (DRAM) row merging, including: identifying, by a memory controller, in a DRAM module, a plurality of rows storing identical data; storing, in a mapping table, data mapping one or more rows of the plurality of rows to another row; and excluding the one or more rows from a refresh the DRAM module.
-
公开(公告)号:US20240126552A1
公开(公告)日:2024-04-18
申请号:US18393657
申请日:2023-12-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JOHN KALAMATIANOS , MICHAEL T. CLARK , MARIUS EVERS , WILLIAM L. WALKER , PAUL MOYER , JAY FLEISCHMAN , JAGADISH B. KOTRA
CPC classification number: G06F9/30181 , G06F9/30043 , G06F9/30098 , G06F9/30138 , G06F9/3834 , G06F9/3877 , G06F9/52
Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
-
公开(公告)号:US20240111678A1
公开(公告)日:2024-04-04
申请号:US17958120
申请日:2022-09-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAGADISH B. KOTRA , JOHN KALAMATIANOS , PAUL MOYER , GABRIEL H. LOH
IPC: G06F12/0862 , G06F12/0811
CPC classification number: G06F12/0862 , G06F12/0811
Abstract: Systems and methods for pushed prefetching include: multiple core complexes, each core complex having multiple cores and multiple caches, the multiple caches configured in a memory hierarchy with multiple levels; an interconnect device coupling the core complexes to each other and coupling the core complexes to shared memory, the shared memory at a lower level of the memory hierarchy than the multiple caches; and a push-based prefetcher having logic to: monitor memory traffic between caches of a first level of the memory hierarchy and the shared memory; and based on the monitoring, initiate a prefetch of data to a cache of the first level of the memory hierarchy.
-
-
-
-
-
-
-
-
-