-
公开(公告)号:US11726917B2
公开(公告)日:2023-08-15
申请号:US16927786
申请日:2020-07-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Susumu Mashimo , John Kalamatianos
IPC: G06F12/0862 , G06F12/0877 , G06F9/30 , G06K9/62
CPC classification number: G06F12/0862 , G06F9/30036 , G06F9/30047 , G06F9/30101 , G06F12/0877 , G06K9/6256 , G06F2212/6024
Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.
-
公开(公告)号:US11726915B2
公开(公告)日:2023-08-15
申请号:US16821632
申请日:2020-03-17
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Yasuko Eckert , Maurice B. Steinman , Steven Raasch
IPC: G06F12/0817 , G06F12/084
CPC classification number: G06F12/0824 , G06F12/084
Abstract: A processing system includes a first set of one or more processing units including a first processing unit, a second set of one or more processing units including a second processing unit, and a memory having an address space shared by the first and second sets. The processing system further includes a distributed coherence directory subsystem having a first coherence directory to support a first subset of one or more address regions of the address space and a second coherence directory to support a second subset of one or more address regions of the address space. In some implementations, the first coherence directory is implemented in the system so as to have a lower access latency for the first set, whereas the second coherence directory is implemented in the system so as to have a lower access latency for the second set.
-
243.
公开(公告)号:US11726837B2
公开(公告)日:2023-08-15
申请号:US17519290
申请日:2021-11-04
Applicant: Advanced Micro Devices, Inc.
Inventor: Karthik Rao , Shomit N. Das , Xudong An , Wei Huang
IPC: G06F9/50 , G06F9/48 , G06F9/38 , H04L67/12 , G06F1/3206 , G06F13/40 , G06F3/06 , H04N19/436
CPC classification number: G06F9/5094 , G06F9/3867 , G06F9/3877 , G06F9/4893 , G06F9/5011 , G06F9/5027 , G06F9/5055 , H04L67/12 , G06F1/3206 , G06F3/0613 , G06F9/5061 , G06F13/409 , H04N19/436
Abstract: In some examples, thermal aware optimization logic determines a characteristic (e.g., a workload or type) of a wavefront (e.g., multiple threads). For example, the characteristic indicates whether the wavefront is compute intensive, memory intensive, mixed, and/or another type of wavefront. The thermal aware optimization logic determines temperature information for one or more compute units (CUs) in one or more processing cores. The temperature information includes predictive thermal information indicating expected temperatures corresponding to the one or more CUs and historical thermal information indicating current or past thermal temperatures of at least a portion of a graphics processing unit (GPU). The logic selects the one or more compute units to process the plurality of threads based on the determined characteristic and the temperature information. The logic provides instructions to the selected subset of the plurality of CUs to execute the wavefront.
-
公开(公告)号:US11720499B2
公开(公告)日:2023-08-08
申请号:US17134790
申请日:2020-12-28
Inventor: Fataneh Ghodrat , Stephen W. Somogyi , Zhenhong Liu
IPC: G06F12/0891 , G06T1/60 , G06T1/20 , G06F12/0831
CPC classification number: G06F12/0891 , G06F12/0833 , G06T1/20 , G06T1/60
Abstract: A graphics pipeline includes a texture cache having cache lines that are partitioned into a plurality of subsets. The graphics pipeline also includes one or more compute units that selectively generates a miss request for a first subset of the plurality of subsets of a cache line in the texture cache in response to a cache miss for a memory access request to an address associated with the first subset of the cache line. In some embodiments, the cache lines are partitioned into a first sector and a second sector. The compute units generate miss requests for the first sector, and bypass generating miss requests for the second sector, in response to cache misses for memory access requests received during a request cycle being in the first sector.
-
公开(公告)号:US11720328B2
公开(公告)日:2023-08-08
申请号:US17029836
申请日:2020-09-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin He , Shubh Shah , Michael Mantor
Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.
-
246.
公开(公告)号:US11720279B2
公开(公告)日:2023-08-08
申请号:US16701794
申请日:2019-12-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov
CPC classification number: G06F3/0659 , G06F3/061 , G06F3/0673 , G06F13/4221 , G06F2213/0026
Abstract: An apparatus and method for managing packet transfer between a memory fabric having a physical layer interface higher data rate than a data rate of a physical layer interface of another device, receives incoming packets from the memory fabric physical layer interface wherein at least some of the packets include different instruction types. The apparatus and method determine a packet type of the incoming packet received from the memory fabric physical layer interface and when the determined incoming packet type is of a type containing an atomic request, the method and apparatus prioritizes transfer of the incoming packet with the atomic request over other packet types of incoming packets, to memory access logic that accesses local memory within an apparatus.
-
公开(公告)号:US11720266B2
公开(公告)日:2023-08-08
申请号:US17591924
申请日:2022-02-03
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: William R. Alverson , Amitabh Mehra , Anil Harwani , Jerry A. Ahrens , Grant E. Ley , Jayesh Joshi
CPC classification number: G06F3/0632 , G06F3/0604 , G06F3/0673 , G11C29/10
Abstract: Automatic memory overclocking, including: increasing a memory frequency setting for a memory module until a memory stability test fails; determining an overclocked memory frequency setting including a highest memory frequency setting passing the memory stability test; and generating a profile including the overclocked memory frequency setting.
-
公开(公告)号:US20230244751A1
公开(公告)日:2023-08-03
申请号:US18297230
申请日:2023-04-07
Applicant: Advanced Micro Devices, Inc.
Inventor: Shaizeen Aga , Nuwan Jayasena , Allen H. Rush , Michael Ignatowski
CPC classification number: G06F17/16 , G06F7/5324 , G06F15/8007
Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
-
249.
公开(公告)号:US20230244492A1
公开(公告)日:2023-08-03
申请号:US18298723
申请日:2023-04-11
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: JAGADISH B. KOTRA , JOHN KALAMATIANOS
CPC classification number: G06F9/3836 , G06F9/3001 , G06F9/522 , G06F9/3877
Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
-
公开(公告)号:US11715183B2
公开(公告)日:2023-08-01
申请号:US17830030
申请日:2022-06-01
Applicant: Advanced Micro Devices, Inc.
Inventor: Ying-Ru Chen
CPC classification number: G06T5/008 , G06T5/40 , G09G5/026 , G09G2320/0242 , G09G2320/066
Abstract: Methods and apparatuses are disclosed herein for performing tone mapping and/or contrast enhancement. In some examples, a block mapping curve is low-pass filtered with block mapping curves of surrounding blocks to form a smoothed block mapping curve. In some examples, overlapped curve mapping of block mapping curves, including smoothed block mapping curves, is performed, including weighting, based on a pixel location, block mapping curves of a group of blocks to generate an interpolated block mapping curve and applying the interpolated block mapping curve to a pixel to perform ton mapping and/or contrast enhancement.
-
-
-
-
-
-
-
-
-