Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Onur Kayiran"

21.

发明公开
ACCELERATING PREDICATED INSTRUCTION EXECUTION IN VECTOR PROCESSORS 审中-公开

公开(公告)号：US20240004656A1

公开(公告)日：2024-01-04

申请号：US17853790

申请日：2022-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Elliott David Binder , Onur Kayiran , Masab Ahmad

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30145 , G06F9/3851 , G06F9/3887

Abstract: Methods and systems are disclosed for processing a vector by a vector processor. Techniques disclosed include receiving predicated instructions by a scheduler, each of which is associated with an opcode, a vector of elements, and a predicate. The techniques further include executing the predicated instructions. Executing a predicated instruction includes compressing, based on an index derived from a predicate of the instruction, elements in a vector of the instruction, where the elements in the vector are contiguously mapped, then, after the mapped elements are processed, decompressing the processed mapped elements, where the processed mapped elements are reverse mapped based on the index.

22.

发明授权
Memory request priority assignment techniques for parallel processors 有权

公开(公告)号：US11507522B2

公开(公告)日：2022-11-22

申请号：US16706421

申请日：2019-12-06

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann

IPC: G06F13/18 , G06F13/16

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

23.

发明授权
Adaptive cache reconfiguration via clustering 有权

公开(公告)号：US11360891B2

公开(公告)日：2022-06-14

申请号：US16355168

申请日：2019-03-15

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert , Gabriel H. Loh

IPC: G06F12/0802 , G06F12/084 , G06F12/0846

Abstract: A method of dynamic cache configuration includes determining, for a first clustering configuration, whether a current cache miss rate exceeds a miss rate threshold. The first clustering configuration includes a plurality of graphics processing unit (GPU) compute units clustered into a first plurality of compute unit clusters. The method further includes clustering, based on the current cache miss rate exceeding the miss rate threshold, the plurality of GPU compute units into a second clustering configuration having a second plurality of compute unit clusters fewer than the first plurality of compute unit clusters.

24.

发明申请
MEMORY ACCESS RESPONSE MERGING IN A MEMORY HIERARCHY 有权

公开(公告)号：US20220091980A1

公开(公告)日：2022-03-24

申请号：US17031706

申请日：2020-09-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Onur Kayiran , Yasuko Eckert , Mark Henry Oskin , Gabriel H. Loh , Steven E. Raasch , Maxim V. Kazakov

IPC: G06F12/0811 , G06F12/084 , G06F12/0877 , G06F13/16 , G06F11/30

Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.

25.

发明授权
Mechanism for distributed-system-aware difference encoding/decoding in graph analytics 有权

公开(公告)号：US11068458B2

公开(公告)日：2021-07-20

申请号：US16202082

申请日：2018-11-27

Applicant: Advanced Micro Devices, Inc.

Inventor： Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert

IPC: G06F16/22 , G06F16/901

Abstract: A portion of a graph dataset is generated for each computing node in a distributed computing system by, for each subject vertex in a graph, recording for the computing node an offset for the subject vertex, where the offset references a first position in an edge array for the computing node, and for each edge of a set of edges coupled with the subject vertex in the graph, calculating an edge value for the edge based on a connected vertex identifier identifying a vertex coupled with the subject vertex via the edge. When the edge value is assigned to the first position, the edge value is determined by a first calculation, and when the edge value is assigned to position subsequent to the first position, the edge value is determined by a second calculation. In the computing node, the edge value is recorded in the edge array.

26.

发明授权
Mechanism for dynamic latency-bandwidth trade-off for efficient broadcasts/multicasts 有权

公开(公告)号：US10938709B2

公开(公告)日：2021-03-02

申请号：US16224739

申请日：2018-12-18

Applicant: Advanced Micro Devices, Inc.

Inventor： Mohamed Assem Ibrahim , Onur Kayiran , Yasuko Eckert , Jieming Yin

IPC: H04L12/761 , H04L12/781 , H04L12/715 , H04L12/931 , H04L12/729 , H04L12/733

Abstract: A method includes receiving, from an origin computing node, a first communication addressed to multiple destination computing nodes in a processor interconnect fabric, measuring a first set of one or more communication metrics associated with a transmission path to one or more of the multiple destination computing nodes, and for each of the destination computing nodes, based on the set of communication metrics, selecting between a multicast transmission mode and unicast transmission mode as a transmission mode for transmitting the first communication to the destination computing node.

Patent Agency Ranking