-
公开(公告)号:US20220188493A1
公开(公告)日:2022-06-16
申请号:US17118442
申请日:2020-12-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Kevin Y. Cheng , Sooraj Puthoor , Onur Kayiran
IPC: G06F30/331 , G06F30/34 , G06F9/38
Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.
-
公开(公告)号:US20210173796A1
公开(公告)日:2021-06-10
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US12217059B1
公开(公告)日:2025-02-04
申请号:US18193177
申请日:2023-03-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander Toufic Freij , Gabriel H. Loh , Onur Kayiran
Abstract: The disclosed device a controller that sets an iteration counter for a loop based on an iteration value read from a loop iteration instruction for the loop. The controller also updates the iteration counter based on a number of times a loop heading instruction for the loop is decoded. When the iteration counter reaches an end value, the controller selects a not taken identifier for the loop to be fetched, to avoid a branch misprediction. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US12079634B2
公开(公告)日:2024-09-03
申请号:US16794124
申请日:2020-02-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Onur Kayiran , Jieming Yin , Yasuko Eckert
CPC classification number: G06F9/3887 , G06F8/41 , G06N10/00
Abstract: A technique for processing qubits in a quantum computing device is provided. The technique includes determining that, in a first cycle, a first quantum processing region is to perform a first quantum operation that does not use a qubit that is stored in the first quantum processing region, identifying a second quantum processing region that is to perform a second quantum operation at a second cycle that is later than the first cycle, wherein the second quantum operation uses the qubit, determining that between the first cycle and the second cycle, no quantum operations are performed in the second quantum processing region, and moving the qubit from the first quantum processing region to the second quantum processing region.
-
公开(公告)号:US20210255871A1
公开(公告)日:2021-08-19
申请号:US16794124
申请日:2020-02-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Onur Kayiran , Jieming Yin , Yasuko Eckert
Abstract: A technique for processing qubits in a quantum computing device is provided. The technique includes determining that, in a first cycle, a first quantum processing region is to perform a first quantum operation that does not use a qubit that is stored in the first quantum processing region, identifying a second quantum processing region that is to perform a second quantum operation at a second cycle that is later than the first cycle, wherein the second quantum operation uses the qubit, determining that between the first cycle and the second cycle, no quantum operations are performed in the second quantum processing region, and moving the qubit from the first quantum processing region to the second quantum processing region.
-
公开(公告)号:US20180115496A1
公开(公告)日:2018-04-26
申请号:US15331002
申请日:2016-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Yasuko Eckert , Onur Kayiran , Nuwan S. Jayasena , Gabriel H. Loh , Dong Ping Zhang
IPC: H04L12/911 , H04L12/863
CPC classification number: H04L47/70 , G06F9/5066 , H04L47/50 , H04L67/10 , H04L67/2842 , Y02D10/22 , Y02D10/36
Abstract: Systems, apparatuses, and methods for implementing mechanisms to improve data locality for distributed processing units are disclosed. A system includes a plurality of distributed processing units (e.g., GPUs) and memory devices. Each processing unit is coupled to one or more local memory devices. The system determines how to partition a workload into a plurality of workgroups based on maximizing data locality and data sharing. The system determines which subset of the plurality of workgroups to dispatch to each processing unit of the plurality of processing units based on maximizing local memory accesses and minimizing remote memory accesses. The system also determines how to partition data buffer(s) based on data sharing patterns of the workgroups. The system maps to each processing unit a separate portion of the data buffer(s) so as to maximize local memory accesses and minimize remote memory accesses.
-
公开(公告)号:US20170293560A1
公开(公告)日:2017-10-12
申请号:US15268953
申请日:2016-09-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Yasuko Eckert , Nuwan Jayasena , Reena Panda , Onur Kayiran , Michael W. Boyer
IPC: G06F12/0862
CPC classification number: G06F12/0862 , G06F2212/1016 , G06F2212/6022 , G06F2212/6024
Abstract: A method and apparatus for performing memory prefetching includes determining whether to initiate prefetching. Upon a determination to initiate prefetching, a first memory row is determined as a suitable prefetch candidate, and it is determined whether a particular set of one or more cachelines of the first memory row is to be prefetched.
-
公开(公告)号:US20240319964A1
公开(公告)日:2024-09-26
申请号:US18126107
申请日:2023-03-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Onur Kayiran , Lee Evan Eisen , Michael Estlick , Jay Fleischman , Matthew R. Poremba , Gabriel H. Loh
IPC: G06F7/503
CPC classification number: G06F7/503
Abstract: A processor includes one or more processor cores configured to perform accumulate top (ACCT) and accumulate bottom (ACCB) instructions. To perform such instructions, at least one processor core of the processor includes an ACCT data path that adds a first portion of a block of data to a first lane of a set of lanes of a top accumulator and adds a carry-out bit to a second lane of the set of lanes of the top accumulator. Further, the at least one processor core includes an ACCB data path that adds a second portion of the block of data to a first lane of a set of lanes of a bottom accumulator and adds a carry-out bit to a second lane of the set of lanes of the bottom accumulator.
-
公开(公告)号:US11966328B2
公开(公告)日:2024-04-23
申请号:US17126977
申请日:2020-12-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Onur Kayiran , Mohamed Assem Ibrahim , Shaizeen Aga
IPC: G06F12/06
CPC classification number: G06F12/06 , G06F2212/1041
Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.
-
公开(公告)号:US20240111489A1
公开(公告)日:2024-04-04
申请号:US17955634
申请日:2022-09-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Onur Kayiran , Michael Estlick , Masab Ahmad , Gabriel H. Loh
CPC classification number: G06F7/4981 , G06F7/506
Abstract: A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.
-
-
-
-
-
-
-
-
-