-
公开(公告)号:US20240004656A1
公开(公告)日:2024-01-04
申请号:US17853790
申请日:2022-06-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Elliott David Binder , Onur Kayiran , Masab Ahmad
CPC classification number: G06F9/30145 , G06F9/3851 , G06F9/3887
Abstract: Methods and systems are disclosed for processing a vector by a vector processor. Techniques disclosed include receiving predicated instructions by a scheduler, each of which is associated with an opcode, a vector of elements, and a predicate. The techniques further include executing the predicated instructions. Executing a predicated instruction includes compressing, based on an index derived from a predicate of the instruction, elements in a vector of the instruction, where the elements in the vector are contiguously mapped, then, after the mapped elements are processed, decompressing the processed mapped elements, where the processed mapped elements are reverse mapped based on the index.
-
公开(公告)号:US11507522B2
公开(公告)日:2022-11-22
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US11360891B2
公开(公告)日:2022-06-14
申请号:US16355168
申请日:2019-03-15
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert , Gabriel H. Loh
IPC: G06F12/0802 , G06F12/084 , G06F12/0846
Abstract: A method of dynamic cache configuration includes determining, for a first clustering configuration, whether a current cache miss rate exceeds a miss rate threshold. The first clustering configuration includes a plurality of graphics processing unit (GPU) compute units clustered into a first plurality of compute unit clusters. The method further includes clustering, based on the current cache miss rate exceeding the miss rate threshold, the plurality of GPU compute units into a second clustering configuration having a second plurality of compute unit clusters fewer than the first plurality of compute unit clusters.
-
公开(公告)号:US20220091980A1
公开(公告)日:2022-03-24
申请号:US17031706
申请日:2020-09-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Onur Kayiran , Yasuko Eckert , Mark Henry Oskin , Gabriel H. Loh , Steven E. Raasch , Maxim V. Kazakov
IPC: G06F12/0811 , G06F12/084 , G06F12/0877 , G06F13/16 , G06F11/30
Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.
-
公开(公告)号:US11068458B2
公开(公告)日:2021-07-20
申请号:US16202082
申请日:2018-11-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert
IPC: G06F16/22 , G06F16/901
Abstract: A portion of a graph dataset is generated for each computing node in a distributed computing system by, for each subject vertex in a graph, recording for the computing node an offset for the subject vertex, where the offset references a first position in an edge array for the computing node, and for each edge of a set of edges coupled with the subject vertex in the graph, calculating an edge value for the edge based on a connected vertex identifier identifying a vertex coupled with the subject vertex via the edge. When the edge value is assigned to the first position, the edge value is determined by a first calculation, and when the edge value is assigned to position subsequent to the first position, the edge value is determined by a second calculation. In the computing node, the edge value is recorded in the edge array.
-
公开(公告)号:US10938709B2
公开(公告)日:2021-03-02
申请号:US16224739
申请日:2018-12-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert , Jieming Yin
IPC: H04L12/761 , H04L12/781 , H04L12/715 , H04L12/931 , H04L12/729 , H04L12/733
Abstract: A method includes receiving, from an origin computing node, a first communication addressed to multiple destination computing nodes in a processor interconnect fabric, measuring a first set of one or more communication metrics associated with a transmission path to one or more of the multiple destination computing nodes, and for each of the destination computing nodes, based on the set of communication metrics, selecting between a multicast transmission mode and unicast transmission mode as a transmission mode for transmitting the first communication to the destination computing node.
-
-
-
-
-