-
公开(公告)号:US12045169B2
公开(公告)日:2024-07-23
申请号:US17133581
申请日:2020-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Furkan Eris , Paul S. Keltcher , John Kalamatianos , Mayank Chhablani , Alok Garg
IPC: G06F12/0862 , G06F16/901 , G06N20/00
CPC classification number: G06F12/0862 , G06F16/9027 , G06N20/00
Abstract: Techniques for identifying a hardware configuration for operation are disclosed. The techniques include applying feature measurements to a trained model; obtaining output values from the trained model, the output values corresponding to different hardware configurations; and operating according to the output values, wherein the output values include one of a certainty score, a ranking, or a regression value.
-
公开(公告)号:US20230305849A1
公开(公告)日:2023-09-28
申请号:US17704627
申请日:2022-03-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Chetana N. Keltcher , Alok Garg , Paul S. Keltcher
CPC classification number: G06F9/3802 , G06F9/30043
Abstract: Array of pointers prefetching is described. In accordance with described techniques, a pointer target instruction is detected by identifying that a destination location of a load instruction is used in an address compute for a memory operation and the load instruction is included in a sequence of load instructions having addresses separated by a step size. An instruction for fetching data of a future load instruction is injected in an instruction stream of a processor. The data of the future load instruction is stored in a temporary register. An additional instruction is injected in the instruction stream for prefetching a pointer target based on an address of the memory operation and the data of the future load instruction.
-
公开(公告)号:US20230297381A1
公开(公告)日:2023-09-21
申请号:US17699855
申请日:2022-03-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Chetana N. Keltcher , Alok Garg , Paul S Keltcher
CPC classification number: G06F9/3806 , G06F9/30043
Abstract: Load dependent branch prediction is described. In accordance with described techniques, a load dependent branch instruction is detected by identifying that a destination location of a load instruction is used in an operation for determining whether a conditional branch is taken or not taken. The load instruction is included in a sequence of load instructions having addresses separated by a step size. An instruction is injected in an instruction stream of a processor for fetching data of a future load instruction using an address of the load instruction offset by a distance based on the step size. An additional instruction is injected in the instruction stream of the processor for precomputing an outcome of a load dependent branch using an address computed based on an address of the operation and the data of the future load instruction.
-
公开(公告)号:US11455252B2
公开(公告)日:2022-09-27
申请号:US16454027
申请日:2019-06-26
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Paul S. Keltcher , Mayank Chhablani , Alok Garg , Furkan Eris
IPC: G06F12/0862 , G06F16/22 , G06N20/20
Abstract: Techniques for generating a model for predicting when different hybrid prefetcher configurations should be used are disclosed. Techniques for using the model to predict when different hybrid prefetcher configurations should be used are also disclosed. The techniques for generating the model include obtaining a set of input data, and generating trees based on the training data. Each tree is associated with a different hybrid prefetcher configuration and the trees output certainty scores for the associated hybrid prefetcher configuration based on hardware feature measurements. To decide on a hybrid prefetcher configuration to use, a prefetcher traverses multiple trees to obtain certainty scores for different hybrid prefetcher configurations and identifies a hybrid prefetcher configuration to used based on a comparison of the certainty scores.
-
公开(公告)号:US20210173702A1
公开(公告)日:2021-06-10
申请号:US16709527
申请日:2019-12-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Alok Garg , Scott Andrew McLelland , Marius Evers , Matthew T. Sobel
IPC: G06F9/48
Abstract: Systems, apparatuses, and methods for implementing scheduler queue assignment burst mode are disclosed. A scheduler queue assignment unit receives a dispatch packet with a plurality of operations from a decode unit in each clock cycle. The scheduler queue assignment unit determines if the number of operations in the dispatch packet for any class of operations is greater than a corresponding threshold for dispatching to the scheduler queues in a single cycle. If the number of operations for a given class is greater than the corresponding threshold, and if a burst mode counter is less than a burst mode window threshold, the scheduler queue assignment unit dispatches the extra number of operations for the given class in a single cycle. By operating in burst mode for a given operation class during a small number of cycles, processor throughput can be increased without starving the processor of other operation classes.
-
6.
公开(公告)号:US09058277B2
公开(公告)日:2015-06-16
申请号:US13671801
申请日:2012-11-08
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sharad Dilip Bade , Alok Garg , John Kalamatianos , Paul Keltcher , Marius Evers , Chitresh Narasimhaiah
CPC classification number: G06F12/0862 , G06F9/3842 , G06F11/30 , G06F2212/6024 , G06F2212/6026 , Y02D10/13
Abstract: Methods and systems for prefetching data for a processor are provided. A system is configured for and a method includes selecting one of a first prefetching control logic and a second prefetching control logic of the processor as a candidate feature, capturing the performance metric of the processor over an inactive sample period when the candidate feature is inactive, capturing a performance metric of the processor over an active sample period when the candidate feature is active, comparing the performance metric of the processor for the active and inactive sample periods, and setting a status of the candidate feature as enabled when the performance metric in the active period indicates improvement over the performance metric in the inactive period, and as disabled when the performance metric in the inactive period indicates improvement over the performance metric in the active period.
Abstract translation: 提供了用于为处理器预取数据的方法和系统。 系统被配置用于并且方法包括选择处理器的第一预取控制逻辑和第二预取控制逻辑之一作为候选特征,当候选特征不活动时,在非活动采样周期捕获处理器的性能度量, 当候选特征处于活动状态时,在活动采样周期捕获处理器的性能度量,比较处于活动和非活动采样周期的处理器的性能度量,并且将候选特征的状态设置为使能时的性能度量 活动期间表示在非活动期间的性能指标改善,当非活动期间的性能指标表示改善了活动期间的绩效指标时被禁用。
-
公开(公告)号:US20240111674A1
公开(公告)日:2024-04-04
申请号:US17955618
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Alok Garg , Neil N Marketkar , Matthew T. Sobel
IPC: G06F12/0811 , G06F12/0875 , G06F12/0884
CPC classification number: G06F12/0811 , G06F12/0875 , G06F12/0884
Abstract: Data reuse cache techniques are described. In one example, a load instruction is generated by an execution unit of a processor unit. In response to the load instruction, data is loaded by a load-store unit for processing by the execution unit and is also stored to a data reuse cache communicatively coupled between the load-store unit and the execution unit. Upon receipt of a subsequent load instruction for the data from the execution unit, the data is loaded from the data reuse cache for processing by the execution unit.
-
公开(公告)号:US20220206798A1
公开(公告)日:2022-06-30
申请号:US17698955
申请日:2022-03-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew T. Sobel , Donald A. Priore , Alok Garg
Abstract: Systems, apparatuses, and methods for implementing scheduler queue assignment logic are disclosed. A processor includes at least a decode unit, scheduler queue assignment logic, scheduler queues, pickers, and execution units. The assignment logic receives a plurality of operations from a decode unit in each clock cycle. The assignment logic includes a separate logical unit for each different type of operation which is executable by the different execution units of the processor. For each different type of operation, the assignment logic determines which of the possible assignment permutations are valid for assigning different numbers of operations to scheduler queues in a given clock cycle. The assignment logic receives an indication of how many operations to assign in the given clock cycle, and then the assignment logic selects one of the valid assignment permutations for the number of operations specified by the indication.
-
公开(公告)号:US11294678B2
公开(公告)日:2022-04-05
申请号:US15991088
申请日:2018-05-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew T. Sobel , Donald A. Priore , Alok Garg
Abstract: Systems, apparatuses, and methods for implementing scheduler queue assignment logic are disclosed. A processor includes at least a decode unit, scheduler queue assignment logic, scheduler queues, pickers, and execution units. The assignment logic receives a plurality of operations from a decode unit in each clock cycle. The assignment logic includes a separate logical unit for each different type of operation which is executable by the different execution units of the processor. For each different type of operation, the assignment logic determines which of the possible assignment permutations are valid for assigning different numbers of operations to scheduler queues in a given clock cycle. The assignment logic receives an indication of how many operations to assign in the given clock cycle, and then the assignment logic selects one of the valid assignment permutations for the number of operations specified by the indication.
-
公开(公告)号:US09916246B1
公开(公告)日:2018-03-13
申请号:US15238209
申请日:2016-08-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Carson Donahue Henrion , Michael K. Ciraula , Gregg Donley , Alok Garg , Eric Busta
IPC: G06F12/00 , G06F12/0811 , G06F12/0815 , G06F12/128
CPC classification number: G06F12/0811 , G06F12/0828 , G06F12/0833 , G06F2212/621 , G06F2212/69 , G06F2212/70
Abstract: A processing system includes a shadow tag memory, which stores a plurality of entries containing coherency information for the cachelines residing at the various levels of private caches. If a cache miss occurs at a private cache, or if coherency information for a cacheline requires updating, a probe is sent to the shadow tag memory maintained at the shared cache to determine whether the requested (or affected) cacheline is stored at another private cache. The probe includes a tag which can be divided into two or more portions. To more efficiently compare the probe tag to the shadow tag entries, the comparison is performed in multiple stages based on the portions of the probe tag.
-
-
-
-
-
-
-
-
-