-
公开(公告)号:US12217059B1
公开(公告)日:2025-02-04
申请号:US18193177
申请日:2023-03-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander Toufic Freij , Gabriel H. Loh , Onur Kayiran
Abstract: The disclosed device a controller that sets an iteration counter for a loop based on an iteration value read from a loop iteration instruction for the loop. The controller also updates the iteration counter based on a number of times a loop heading instruction for the loop is decoded. When the iteration counter reaches an end value, the controller selects a not taken identifier for the loop to be fetched, to avoid a branch misprediction. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20240201948A1
公开(公告)日:2024-06-20
申请号:US18083273
申请日:2022-12-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel H. Loh
IPC: G06F7/483
CPC classification number: G06F7/483
Abstract: A processing device for encoding floating point numbers comprising memory configured to store data comprising the floating point numbers and circuitry. The circuitry is configured to, for a set of the floating point numbers, identify which of the floating point numbers represent a zero value and which of the floating point numbers represent a non-zero value, convert the floating point numbers which represent a non-zero value into a block floating point format value and generate an encoded sparse block floating point format value. The circuitry is also configured to, decode floating point numbers. For an encoded block floating point format value, the circuitry converts the encoded block floating point format value to a set of non-zero floating point numbers based on a sparsity mask previously generated to encode the encoded block floating point format value and generates a non-sparse set of floating point values.
-
公开(公告)号:US20240111677A1
公开(公告)日:2024-04-04
申请号:US17957795
申请日:2022-09-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel H. Loh , Marko Scrbak , Akhil Arunkumar , John Kalamatianos
IPC: G06F12/0862 , G06F12/0877
CPC classification number: G06F12/0862 , G06F12/0877 , G06F12/0811
Abstract: A method for performing prefetching operations is disclosed. The method includes storing a recorded access pattern indicating a set of accesses for a region; in response to an access within the region, fetching the recorded access pattern; and performing prefetching based on the access pattern.
-
公开(公告)号:US11847062B2
公开(公告)日:2023-12-19
申请号:US17552703
申请日:2021-12-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Tarun Nakra , Jay Fleischman , Gautam Tarasingh Hazari , Akhil Arunkumar , William L. Walker , Gabriel H. Loh , John Kalamatianos , Marko Scrbak
IPC: G06F12/0897 , G06F12/0891
CPC classification number: G06F12/0897 , G06F12/0891 , G06F2212/1028
Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.
-
公开(公告)号:US20230393855A1
公开(公告)日:2023-12-07
申请号:US17833504
申请日:2022-06-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel H. Loh , Yasuko Eckert , Bradford Beckmann , Michael Estlick , Jay Fleischman
CPC classification number: G06F9/3887 , G06F9/3877 , G06F9/30098 , G06F9/3555
Abstract: An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.
-
公开(公告)号:US20230195643A1
公开(公告)日:2023-06-22
申请号:US17552703
申请日:2021-12-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Tarun Nakra , Jay Fleischman , Gautam Tarasingh Hazari , Akhil Arunkumar , William L. Walker , Gabriel H. Loh , John Kalamatianos , Marko Scrbak
IPC: G06F12/0897 , G06F12/0891
CPC classification number: G06F12/0897 , G06F12/0891 , G06F2212/1028
Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.
-
公开(公告)号:US20220365975A1
公开(公告)日:2022-11-17
申请号:US17564413
申请日:2021-12-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Ganesh Dasika , Michael Ignatowski , Michael J. Schulte , Gabriel H. Loh , Valentina Salapura , Angela Beth Dalton
IPC: G06F16/901 , G06F15/80
Abstract: An accelerator device includes a first processing unit to access a structure of a graph dataset, and a second processing unit coupled with the first processing unit to perform computations based on data values in the graph dataset.
-
公开(公告)号:US20200183848A1
公开(公告)日:2020-06-11
申请号:US16214363
申请日:2018-12-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel H. Loh
IPC: G06F12/0877
Abstract: Systems, apparatuses, and methods for efficiently performing memory accesses in a computing system are disclosed. A computing system includes one or more clients, a communication fabric and a last-level cache implemented with low latency, high bandwidth memory. The cache controller for the last-level cache determines a range of addresses corresponding to a first region of system memory with a copy of data stored in a second region of the last-level cache. The cache controller sends a selected memory access request to system memory when the cache controller determines a request address of the memory access request is not within the range of addresses. The cache controller services the selected memory request by accessing data from the last-level cache when the cache controller determines the request address is within the range of addresses.
-
公开(公告)号:US20190013051A1
公开(公告)日:2019-01-10
申请号:US16129252
申请日:2018-09-12
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Nuwan S. Jayasena , Gabriel H. Loh , Bradford M. Beckmann , James M. O'Connor , Lisa R. Hsu
CPC classification number: G11C5/06 , G06F12/02 , G06F12/10 , G06F13/1694 , G11C7/1006
Abstract: A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode.
-
公开(公告)号:US20180115496A1
公开(公告)日:2018-04-26
申请号:US15331002
申请日:2016-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Yasuko Eckert , Onur Kayiran , Nuwan S. Jayasena , Gabriel H. Loh , Dong Ping Zhang
IPC: H04L12/911 , H04L12/863
CPC classification number: H04L47/70 , G06F9/5066 , H04L47/50 , H04L67/10 , H04L67/2842 , Y02D10/22 , Y02D10/36
Abstract: Systems, apparatuses, and methods for implementing mechanisms to improve data locality for distributed processing units are disclosed. A system includes a plurality of distributed processing units (e.g., GPUs) and memory devices. Each processing unit is coupled to one or more local memory devices. The system determines how to partition a workload into a plurality of workgroups based on maximizing data locality and data sharing. The system determines which subset of the plurality of workgroups to dispatch to each processing unit of the plurality of processing units based on maximizing local memory accesses and minimizing remote memory accesses. The system also determines how to partition data buffer(s) based on data sharing patterns of the workgroups. The system maps to each processing unit a separate portion of the data buffer(s) so as to maximize local memory accesses and minimize remote memory accesses.
-
-
-
-
-
-
-
-
-