Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Shaizeen Aga"

31.

发明授权
Hardware-software collaborative address mapping scheme for efficient processing-in-memory systems 有权

公开(公告)号：US11487447B2

公开(公告)日：2022-11-01

申请号：US17006646

申请日：2020-08-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Mahzabeen Islam , Shaizeen Aga , Nuwan Jayasena , Jagadish B. Kotra

IPC: G06F12/00 , G06F3/06 , G06F12/02

Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.

32.

发明申请
APPROACH FOR ENFORCING ORDERING BETWEEN MEMORY-CENTRIC AND CORE-CENTRIC MEMORY OPERATIONS 有权

公开(公告)号：US20220317926A1

公开(公告)日：2022-10-06

申请号：US17219446

申请日：2021-03-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Shaizeen Aga , Nuwan Jayasena , Johnathan Alsop

IPC: G06F3/06

Abstract: Ordering between memory-centric memory operations, referred to hereinafter as “MC-Mem-Ops,” and core-centric memory operations, referred to hereinafter as “CC-Mem-Ops,” is enforced using inter-centric fences, referred to hereinafter as an “IC-fences.” IC-fences are implemented by an ordering primitive or ordering instruction, that cause a memory controller, a cache controller, etc., to enforce ordering of MC-Mem-Ops and CC-Mem-Ops throughout the memory pipeline and at the memory controller by not reordering MC-Mem-Ops (or sometimes CC-Mem-Ops) that arrive before the IC-fence to after the IC-fence. Processing of an IC-fence also causes the memory controller to issue an ordering acknowledgment to the thread that issued the IC-fence instruction. IC-fences are tracked at the core and designated as complete when the ordering acknowledgment is received. Embodiments include a completion level-specific cache flush operation which, when used with an IC-fence, provides proper ordering between cached CC-Mem-Ops and MC-Mem-ops with reduced data transfer and completion times.

33.

发明申请
SYSTEM AND METHOD FOR COALESCED MULTICAST DATA TRANSFERS OVER MEMORY INTERFACES 有权

公开(公告)号：US20220317876A1

公开(公告)日：2022-10-06

申请号：US17218700

申请日：2021-03-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Johnathan Alsop , Nuwan Jayasena , Shaizeen Aga , Andrew McCrabb

IPC: G06F3/06

Abstract: Methods and apparatuses to control digital data transfer via a memory channel between a memory module and a processor are disclosed. At least one of the memory module or the processor coalesces a plurality of short data words into multicast coalesced block data comprising a single data block for transfer via the memory channel. Each of the plurality of short data words pertains to one of at least two partitioned memory submodules in the memory module. The multicast coalesced block data is communicated over the memory channel.

34.

发明授权
Providing host-based error detection capabilities in a remote execution device 有权

公开(公告)号：US11409608B2

公开(公告)日：2022-08-09

申请号：US17136549

申请日：2020-12-29

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Shrikanth Ganapathy , Ross V. La Fetra , John Kalamatianos , Sudhanva Gurumurthi , Shaizeen Aga , Vilas Sridharan , Michael Ignatowski , Nuwan Jayasena

IPC: G06F11/00 , G06F11/14 , G06F11/10

Abstract: Providing host-based error detection capabilities in a remote execution device is disclosed. A remote execution device performs a host-offloaded operation that modifies a block of data stored in memory. Metadata is generated locally for the modified of block of data such that the local metadata generation emulates host-based metadata generation. Stored metadata for the block of data is updated with the locally generated metadata for the modified portion of the block of data. When the host performs an integrity check on the modified block of data using the updated metadata, the host does not distinguish between metadata generated by the host and metadata generated in the remote execution device.

35.

发明授权
Device and method for accelerating matrix multiply operations 有权

公开(公告)号：US10956536B2

公开(公告)日：2021-03-23

申请号：US16176662

申请日：2018-10-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Shaizeen Aga , Nuwan Jayasena , Allen H. Rush , Michael Ignatowski

IPC: G06F17/16 , G06F7/53 , G06F15/80

Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

36.

发明授权
Device and method for accelerating matrix multiply operations as a sum of outer products 有权

公开(公告)号：US10902087B2

公开(公告)日：2021-01-26

申请号：US16176678

申请日：2018-10-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Shaizeen Aga , Nuwan Jayasena , Allen H. Rush , Michael Ignatowski

IPC: G06F17/16 , G06F15/80 , G06F7/53

Abstract: A processing device is provided which includes memory and a processor comprising a plurality of processor cores in communication with each other via first and second hierarchical communication links. Each processor core in a group of the processor cores is in communication with each other via the first hierarchical communication links. Each processor core is configured to store, in the memory, one of a plurality of sub-portions of data of a first matrix, store, in the memory, one of a plurality of sub-portions of data of a second matrix, determine an outer product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core of the group of processor cores, another sub-portion of data of the second matrix and determine another outer product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

37.

发明申请
NEAR-MEMORY DATA-DEPENDENT GATHER AND PACKING 审中-公开

公开(公告)号：US20200081651A1

公开(公告)日：2020-03-12

申请号：US16123837

申请日：2018-09-06

Applicant: Advanced Micro Devices, Inc.

Inventor： Shaizeen Aga , Nuwan Jayasena

IPC: G06F3/06 , G06F11/10

Abstract: Methods, systems, and devices for near-memory data-dependent gathering and packing of data stored in a memory. A processing device extracts a function, a memory source address, and a memory destination address from a near-memory data-dependent gathering and packing primitive. A signal to perform gathering and packing operations based on the primitive is sent to near-memory processing circuitry of a memory device. The near-memory processing circuitry receives the signal, gathers data from the memory device based on the function and the memory source address, and packs the gathered data into the memory device based on the memory destination address.

38.

发明授权
Approach for performing efficient memory operations using near-memory compute elements 有权

公开(公告)号：US12235756B2

公开(公告)日：2025-02-25

申请号：US17557568

申请日：2021-12-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Shaizeen Aga , Johnathan Alsop , Nuwan Jayasena

IPC: G06F12/06

Abstract: Near-memory compute elements perform memory operations and temporarily store at least a portion of address information for the memory operations in local storage. A broadcast memory command is then issued to the near-memory compute elements that causes the near-memory compute elements to perform a subsequent memory operation using their respective address information stored in the local storage. This allows a single broadcast memory command to be used to perform memory operations across multiple memory elements, such as DRAM banks, using bank-specific address information. In one implementation, the approach is used to process workloads with irregular updates to memory while consuming less command bus bandwidth than conventional approaches. Implementations include using conditional flags to selectively designate address information in local storage that is to be processed with the broadcast memory command.

39.

发明公开
APPROACH FOR MANAGING NEAR-MEMORY PROCESSING COMMANDS FROM MULTIPLE PROCESSOR THREADS TO PREVENT INTERFERENCE AT NEAR-MEMORY PROCESSING ELEMENTS 审中-公开

公开(公告)号：US20240004653A1

公开(公告)日：2024-01-04

申请号：US17853613

申请日：2022-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Johnathan Alsop , Laurent S. White , Shaizeen Aga

IPC: G06F9/30

CPC classification number: G06F9/3009 , G06F9/3004 , G06F9/30101

Abstract: An approach is provided for managing near-memory processing commands (“PIM commands”) from multiple processor threads in a manner to prevent interference and maintain correctness at near-memory processing elements. A memory controller uses thread identification information and last command information to issue a PIM command sequence from a first processor thread, directed to a PIM-enabled memory element, while deferring the issuance of PIM command sequences from other processor threads, directed to the same PIM-enabled memory element. After the last PIM command in the PIM command sequence for the first processor thread has been issued, a PIM command sequence for another processor thread is issued, and so on. The approach allows multiple processor threads to concurrently issue fine grained PIM commands to the same PIM-enabled memory element without having to be aware of address-to-memory element mapping, and without having to coordinate with other threads.

40.

发明授权
Approach for supporting memory-centric operations on cached data 有权

公开(公告)号：US11847061B2

公开(公告)日：2023-12-19

申请号：US17385783

申请日：2021-07-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Shaizeen Aga , Nuwan Jayasena , John Kalamatianos

IPC: G06F12/08 , G06F13/16 , G06F12/02 , G06F12/0891 , G06F12/0811

CPC classification number: G06F12/0891 , G06F12/0238 , G06F12/0811 , G06F13/1668

Abstract: A technical solution to the technical problem of how to support memory-centric operations on cached data uses a novel memory-centric memory operation that invokes write back functionality on cache controllers and memory controllers. The write back functionality enforces selective flushing of dirty, i.e., modified, cached data that is needed for memory-centric memory operations from caches to the completion level of the memory-centric memory operations, and updates the coherence state appropriately at each cache level. The technical solution ensures that commands to implement the selective cache flushing are ordered before the memory-centric memory operation at the completion level of the memory-centric memory operation.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification