-
公开(公告)号:US12124531B2
公开(公告)日:2024-10-22
申请号:US18297230
申请日:2023-04-07
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Nuwan Jayasena , Allen H. Rush , Michael Ignatowski
CPC classification number: G06F17/16 , G06F7/5324 , G06F15/8007
Abstract: A processing device including a plurality of clusters of processor cores and a method for use in the processing device is disclosed. Each processor core in a cluster of processor cores is in communication with the other processor cores in the cluster and at least one processor core of each cluster is in communication with at least a processor core of a different cluster of processor cores. Each processor core is configured to store a product of a portion of a first matrix and a first portion of a second matrix in the memory, and store a product of the portion of the first matrix and a second portion of the second matrix in the memory, where the second portion of the second matrix is received from a processor core in the cluster of processor cores.
-
公开(公告)号:US11726918B2
公开(公告)日:2023-08-15
申请号:US17361145
申请日:2021-06-28
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Johnathan Alsop , Alexandru Dutu , Shaizeen Aga , Nuwan Jayasena
IPC: G06F12/0871 , G06F12/02 , G06F12/084 , G06F12/0846
CPC classification number: G06F12/0871 , G06F12/0238 , G06F12/084 , G06F12/0846
Abstract: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
-
公开(公告)号:US11640444B2
公开(公告)日:2023-05-02
申请号:US17208526
申请日:2021-03-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Nuwan Jayasena , Allen H. Rush , Michael Ignatowski
Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
-
4.
公开(公告)号:US20220276795A1
公开(公告)日:2022-09-01
申请号:US17745278
申请日:2022-05-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Mahzabeen Islam , Shaizeen Aga , Nuwan Jayasena , Jagadish B. Kotra
Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
-
公开(公告)号:US11188406B1
公开(公告)日:2021-11-30
申请号:US17218506
申请日:2021-03-31
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Johnathan Alsop , Shaizeen Aga
Abstract: Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.
-
公开(公告)号:US11099788B2
公开(公告)日:2021-08-24
申请号:US16658733
申请日:2019-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan Jayasena , Shaizeen Aga
Abstract: An approach is provided for implementing near-memory data reduction during store operations to off-chip or off-die memory. A Near-Memory Reduction (NMR) unit provides near-memory data reduction during write operations to a specified address range. The NMR unit is configured with a range of addresses to be reduced and when a store operation specifies an address within the range of addresses, the NRM unit performs data reduction by adding the data value specified by the store operation to an accumulated reduction result. According to an embodiment, the NRM unit maintains a count of the number of updates to the accumulated reduction result that are used to determine when data reduction has been completed.
-
公开(公告)号:US11977782B2
公开(公告)日:2024-05-07
申请号:US17855442
申请日:2022-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Abd ElMohsen Ibrahim , Meysam Taassori , Mahzabeen Islam , Shaizeen Aga
IPC: G06F3/06
CPC classification number: G06F3/0659 , G06F3/0613 , G06F3/0673
Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.
-
8.
公开(公告)号:US20240004585A1
公开(公告)日:2024-01-04
申请号:US17855442
申请日:2022-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Abd ElMohsen Ibrahim , Meysam Taassori , Mahzabeen Islam , Shaizeen Aga
IPC: G06F3/06
CPC classification number: G06F3/0659 , G06F3/0613 , G06F3/0673
Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.
-
9.
公开(公告)号:US20230409238A1
公开(公告)日:2023-12-21
申请号:US17845263
申请日:2022-06-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Nuwan Jayasena
IPC: G06F3/06
CPC classification number: G06F3/0659 , G06F3/0673 , G06F3/0604
Abstract: An approach is provided for processing near-memory processing commands, e.g., PIM commands, using PIM register definition data that defines multiple combinations of source and/or destination registers to be used to process PIM commands. A particular combination of source and/or destination registers to be used to process a PIM command is specified by the PIM command or determined by a near-memory processing element processing the PIM command. According to another implementation, the PIM register definition data specifies an initial combination of source and/or destination registers and one or more update functions for each PIM command. A near-memory processing element processes a PIM command using the initial combination of source and/or destination registers and uses the one or more update functions to update the combination of source and/or destination registers to be used the next time the PIM command is processed.
-
公开(公告)号:US11669271B2
公开(公告)日:2023-06-06
申请号:US16848920
申请日:2020-04-15
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Anirban Nag , Nuwan Jayasena , Shaizeen Aga
CPC classification number: G06F3/0659 , G06F3/0611 , G06F3/0614 , G06F3/0673 , G06F9/4498 , G06F9/4881
Abstract: Memory operations using compound memory commands, including: receiving, by a memory module, a compound memory command indicating one or more operations to be applied to each portion of a plurality of portions of contiguous memory in the memory module; generating, based on the compound memory command, a plurality of memory commands to apply the one or more operations to each portion of the plurality of portions of contiguous memory; and executing the plurality of memory commands.
-
-
-
-
-
-
-
-
-