-
公开(公告)号:US20240193292A1
公开(公告)日:2024-06-13
申请号:US18212858
申请日:2023-06-22
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jagadish B. Kotra , David Kaplan , Kishore Punniyamurthy , Alexander Toufic Freij
IPC: G06F21/62
CPC classification number: G06F21/6218 , G06F2221/2113 , G06F2221/2141
Abstract: A processing system receives graph object data and graph object metadata. The processing system stores the graph object metadata inline with the graph object data. The graph object metadata indicates access permissions for corresponding graph objects. Because the graph object metadata is stored inline with the graph object data, the graph object metadata is more easily retrieved and fewer system resources are consumed to determine access permissions of a requester as compared to a system where graph object metadata is stored separately from the graph object data.
-
公开(公告)号:US11119665B2
公开(公告)日:2021-09-14
申请号:US16212388
申请日:2018-12-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. Das , Kishore Punniyamurthy
IPC: G06F3/06
Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.
-
公开(公告)号:US12197378B2
公开(公告)日:2025-01-14
申请号:US17804949
申请日:2022-06-01
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jagadish B. Kotra , Kishore Punniyamurthy
Abstract: An apparatus configured for offloading system service tasks to a processing-in-memory (“PIM”) device includes an agent configured to: receive, from a host processor, a request to offload a memory task associated with a system service to the PIM device; determine at least one PIM command and at least one memory page associated with the host processor based upon the request; and issue the at least one PIM command to the PIM device for execution by the PIM device to perform the memory task upon the at least one memory page.
-
公开(公告)号:US20220206946A1
公开(公告)日:2022-06-30
申请号:US17135657
申请日:2020-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Brandon K. Potter , Marko Scrbak , Sergey Blagodurov , Kishore Punniyamurthy , Nathaniel Morris
IPC: G06F12/0817
Abstract: Method and apparatus monitor eviction conflicts among cache directory entries in a cache directory and produce cache directory victim entry information for a memory manager. In some examples, the memory manager reduces future cache directory conflicts by changing a page level physical address assignment for a page of memory based on the produced cache directory victim entry information. In some examples, a scalable data fabric includes hardware control logic that performs the monitoring of the eviction conflicts among cache directory entries in the cache directory and produces the cache directory victim entry information.
-
公开(公告)号:US20250103650A1
公开(公告)日:2025-03-27
申请号:US18371010
申请日:2023-09-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Jagadish B. Kotra
IPC: G06F16/901
Abstract: Graph analytics system are described. In accordance with the described techniques, a graph having vertices that include a first vertex and a second vertex that are associated with access control metadata are received. An updated graph is output based on a merging of the first vertex and the second vertex into a merged vertex of a group of vertices based on the first vertex and the second vertex being associated with access control metadata common to the first vertex and the second vertex and based on a reordering technique. A single copy of the access control metadata is stored for the first vertex and the second vertex.
-
公开(公告)号:US20250077409A1
公开(公告)日:2025-03-06
申请号:US18240640
申请日:2023-08-31
Applicant: Advanced Micro Devices, Inc , ATI Technologies ULC
Inventor: Kishore Punniyamurthy , Richard David Sodke , Furkan Eris , Sergey Blagodurov , Bradford Michael Beckmann , Brandon Keith Potter , Khaled Hamidouche
Abstract: A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.
-
公开(公告)号:US20240220336A1
公开(公告)日:2024-07-04
申请号:US18147081
申请日:2022-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Khaled Hamidouche , Brandon K Potter , Rohit Shahaji Zambre
IPC: G06F9/54 , G06F9/50 , G06F15/173
CPC classification number: G06F9/54 , G06F9/5044 , G06F15/17356
Abstract: In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.
-
公开(公告)号:US11880312B2
公开(公告)日:2024-01-23
申请号:US17539189
申请日:2021-11-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , SeyedMohammad SeyedzadehDelcheh , Sergey Blagodurov , Ganesh Dasika , Jagadish B Kotra
IPC: G06F12/00 , G06F12/126 , G06F12/0855
CPC classification number: G06F12/126 , G06F12/0859 , G06F2212/1024 , G06F2212/6042
Abstract: A method includes storing a function representing a set of data elements stored in a backing memory and, in response to a first memory read request for a first data element of the set of data elements, calculating a function result representing the first data element based on the function.
-
公开(公告)号:US11507522B2
公开(公告)日:2022-11-22
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US20240311182A1
公开(公告)日:2024-09-19
申请号:US18185641
申请日:2023-03-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Sagnik Basu , Khaled Hamidouche , Brandon Keith Potter
IPC: G06F9/48
CPC classification number: G06F9/4881
Abstract: A device includes a communication scheduler to generate schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes. The device includes data reduction logic to: identify one or more skewed nodes among the plurality of nodes, perform, according to a first set of schedule trees, a first operation to generate partial results based on data contributed from non-skewed nodes, and perform, according to a second set of schedule trees, a second operation to generate final results based on the partial results and data contributed from the one or more skewed nodes.
-
-
-
-
-
-
-
-
-