-
公开(公告)号:US20230021492A1
公开(公告)日:2023-01-26
申请号:US17385783
申请日:2021-07-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Nuwan Jayasena , John Kalamatianos
IPC: G06F12/0891 , G06F12/0811 , G06F12/02 , G06F13/16
Abstract: A technical solution to the technical problem of how to support memory-centric operations on cached data uses a novel memory-centric memory operation that invokes write back functionality on cache controllers and memory controllers. The write back functionality enforces selective flushing of dirty, i.e., modified, cached data that is needed for memory-centric memory operations from caches to the completion level of the memory-centric memory operations, and updates the coherence state appropriately at each cache level. The technical solution ensures that commands to implement the selective cache flushing are ordered before the memory-centric memory operation at the completion level of the memory-centric memory operation.
-
公开(公告)号:US20220091974A1
公开(公告)日:2022-03-24
申请号:US17031518
申请日:2020-09-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan Jayasena , Shaizeen Aga
IPC: G06F12/02 , G06F12/0815 , G06F12/084 , G06F12/0868 , G06F15/173
Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.
-
公开(公告)号:US11262949B2
公开(公告)日:2022-03-01
申请号:US16885677
申请日:2020-05-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Johnathan Alsop , Shaizeen Aga , Nuwan Jayasena
Abstract: An approach is provided for reducing command bus traffic between memory controllers and PIM-enabled memory modules using special PIM commands. The term “special PIM command” is used herein to describe embodiments and refers to a PIM command for which the corresponding module-specific command information is provided to memory modules via a non-command bus data path. A memory controller generates and issues a special PIM command to multiple PIM-enabled memory modules via a command bus and provides module-specific command information (e.g., address information) for the special PIM command to the PIM-enabled memory modules via the non-command bus data path that is shared by the PIM-enabled memory modules and the memory controller.
-
公开(公告)号:US20210117133A1
公开(公告)日:2021-04-22
申请号:US16658733
申请日:2019-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan Jayasena , Shaizeen Aga
Abstract: An approach is provided for implementing near-memory data reduction during store operations to off-chip or off-die memory. A Near-Memory Reduction (NMR) unit provides near-memory data reduction during write operations to a specified address range. The NMR unit is configured with a range of addresses to be reduced and when a store operation specifies an address within the range of addresses, the NRM unit performs data reduction by adding the data value specified by the store operation to an accumulated reduction result. According to an embodiment, the NRM unit maintains a count of the number of updates to the accumulated reduction result that are used to determine when data reduction has been completed.
-
公开(公告)号:US12099866B2
公开(公告)日:2024-09-24
申请号:US17135381
申请日:2020-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Jonathan Alsop , Shaizeen Aga , Nuwan Jayasena
CPC classification number: G06F9/485 , G06F3/0604 , G06F3/0659 , G06F3/0673 , G06F12/0284 , G06F12/0292 , G06F12/145
Abstract: An Address Mapping-Aware Tasking (AMAT) mechanism manages compute task data and issues compute tasks on behalf of threads that created the compute task data. The AMAT mechanism stores compute task data generated by host threads in a set of partitions, where each partition is designated for a particular memory module. The AMAT mechanism maintains address mapping data that maps address information to partitions. Threads push compute task data to the AMAT mechanism instead of generating and issuing their own compute tasks. The AMAT mechanism uses address information included in the compute task data and the address mapping data to determine partitions in which to store the compute task data. The AMAT mechanism then issues compute tasks to be executed near the corresponding memory modules (i.e., in PIM execution units or NUMA compute nodes) based upon the compute task data stored in the partitions.
-
公开(公告)号:US20240220315A1
公开(公告)日:2024-07-04
申请号:US18091443
申请日:2022-12-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Suchita Pati , Shaizeen Aga , Nuwan Jayasena , Matthew David Sinclair
CPC classification number: G06F9/4881 , G06F9/52
Abstract: A processing system includes a scheduling mechanism for producing data for fine-grained reordering of workgroups of a kernel to produce blocks of data, such as for communication across devices to enable overlapping of a producer computation with an all-reduce communication across the network. This scheduling mechanism enables a first parallel processor to schedule and execute a set of workgroups of a producer operation to generate data for transmission to a second parallel processor in a desired traffic pattern. At the same time, the second parallel processor schedules and executes a different set of workgroups of the producer operation to generate data for transmission in a desired traffic pattern to a third parallel processor or back to the first parallel processor.
-
公开(公告)号:US11900161B2
公开(公告)日:2024-02-13
申请号:US16828190
申请日:2020-03-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Anirban Nag , Nuwan Jayasena , Shaizeen Aga
IPC: G06F9/50 , G06F12/1027 , G06F12/02 , G06F9/54 , G06F12/0882
CPC classification number: G06F9/5016 , G06F9/546 , G06F12/0246 , G06F12/0882 , G06F12/1027 , G06F2212/657
Abstract: Memory allocation for processing-in-memory operations, including: receiving, by an allocation module, a memory allocation request indicating a plurality of data structure operands for a processing-in-memory operation; determining a memory allocation pattern for the plurality of data structure operands, wherein the memory allocation pattern interleaves a plurality of component pages of a memory page across the plurality of data structure operands; and allocating the memory page based on the determined memory allocation pattern.
-
公开(公告)号:US11874739B2
公开(公告)日:2024-01-16
申请号:US17033398
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Sudhanva Gurumurthi , Vilas Sridharan , Shaizeen Aga , Nuwan Jayasena , Michael Ignatowski , Shrikanth Ganapathy , John Kalamatianos
CPC classification number: G06F11/1076 , G06F21/602 , H04L9/32
Abstract: A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
-
公开(公告)号:US20220197647A1
公开(公告)日:2022-06-23
申请号:US17126977
申请日:2020-12-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Onur Kayiran , Mohamed Assem Ibrahim , Shaizeen Aga
Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.
-
公开(公告)号:US11216373B2
公开(公告)日:2022-01-04
申请号:US16887713
申请日:2020-05-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Nuwan Jayasena , Johnathan Alsop
IPC: G06F12/06 , G11C11/408
Abstract: A memory controller may be configured with command logic that is capable of sending a memory access command having incomplete address information via a command/address bus that connects the memory controller to memory modules. The memory controller may send the memory access command via the bus for accessing data stored at memory locations of the memory modules. The memory locations may correspond to different near-memory generated reflecting that the data is not address aligned across the memory modules. Nonetheless, because of the near-memory address generation, the memory controller can send the memory access command having incomplete address information for accessing the data stored at the different addresses, as opposed to having to send multiple memory access commands specifying complete address information on the bus for accessing the data at the different addresses, thereby conserving usage of the available bus bandwidth, reducing power consumption, and increasing compute throughput.
-
-
-
-
-
-
-
-
-