-
91.
公开(公告)号:US11625249B2
公开(公告)日:2023-04-11
申请号:US17137140
申请日:2020-12-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jagadish B. Kotra , John Kalamatianos
Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
-
公开(公告)号:US11586555B2
公开(公告)日:2023-02-21
申请号:US17231957
申请日:2021-04-15
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexander D. Breslow , John Kalamatianos
IPC: G06F12/0895 , H03M7/30
Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.
-
93.
公开(公告)号:US20230030679A1
公开(公告)日:2023-02-02
申请号:US17386115
申请日:2021-07-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos , Gagandeep Panwar
IPC: G06F12/02 , G06F12/0817 , G06F12/06 , G06F9/30
Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.
-
公开(公告)号:US11556162B2
公开(公告)日:2023-01-17
申请号:US15923153
申请日:2018-03-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Shijia Wei , Joseph L. Greathouse , John Kalamatianos
Abstract: A processor utilizes instruction based sampling to generate sampling data sampled on a per instruction basis during execution of an instruction. The sampling data indicates what processor hardware was used due to the execution of the instruction. Software receives the sampling data and generates an estimate of energy used by the instruction based on the sampling data. The sampling data may include microarchitectural events and the energy estimate utilizes a base energy amount corresponding to the instruction executed along with energy amounts corresponding to the microarchitectural events in the sampling data. The sampling data may include switching events associated with hardware blocks that switched due to execution of the instruction and the energy estimate for the instruction is based on the switching events and capacitance estimates associated with the hardware blocks.
-
公开(公告)号:US11513802B2
公开(公告)日:2022-11-29
申请号:US17033883
申请日:2020-09-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. Boyer , John Kalamatianos , Pritam Majumder
Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.
-
96.
公开(公告)号:US11455252B2
公开(公告)日:2022-09-27
申请号:US16454027
申请日:2019-06-26
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Paul S. Keltcher , Mayank Chhablani , Alok Garg , Furkan Eris
IPC: G06F12/0862 , G06F16/22 , G06N20/20
Abstract: Techniques for generating a model for predicting when different hybrid prefetcher configurations should be used are disclosed. Techniques for using the model to predict when different hybrid prefetcher configurations should be used are also disclosed. The techniques for generating the model include obtaining a set of input data, and generating trees based on the training data. Each tree is associated with a different hybrid prefetcher configuration and the trees output certainty scores for the associated hybrid prefetcher configuration based on hardware feature measurements. To decide on a hybrid prefetcher configuration to use, a prefetcher traverses multiple trees to obtain certainty scores for different hybrid prefetcher configurations and identifies a hybrid prefetcher configuration to used based on a comparison of the certainty scores.
-
公开(公告)号:US11409608B2
公开(公告)日:2022-08-09
申请号:US17136549
申请日:2020-12-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shrikanth Ganapathy , Ross V. La Fetra , John Kalamatianos , Sudhanva Gurumurthi , Shaizeen Aga , Vilas Sridharan , Michael Ignatowski , Nuwan Jayasena
Abstract: Providing host-based error detection capabilities in a remote execution device is disclosed. A remote execution device performs a host-offloaded operation that modifies a block of data stored in memory. Metadata is generated locally for the modified of block of data such that the local metadata generation emulates host-based metadata generation. Stored metadata for the block of data is updated with the locally generated metadata for the modified portion of the block of data. When the host performs an integrity check on the modified block of data using the updated metadata, the host does not distinguish between metadata generated by the host and metadata generated in the remote execution device.
-
公开(公告)号:US20220103191A1
公开(公告)日:2022-03-31
申请号:US17125145
申请日:2020-12-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Shrikanth Ganapathy , John Kalamatianos
IPC: H03M13/35 , G06F11/10 , G06F12/0895
Abstract: Systems, apparatuses, and methods for implementing masked fault detection for reliable low voltage cache operation are disclosed. A processor includes a cache that can operate at a relatively low voltage level to conserve power. However, at low voltage levels, the cache is more likely to suffer from bit errors. To mitigate the bit errors occurring in cache lines at low voltage levels, the cache employs a strategy to uncover masked faults during runtime accesses to data by actual software applications. For example, on the first read of a given cache line, the data of the given cache line is inverted and written back to the same data array entry. Also, the error correction bits are regenerated for the inverted data. On a second read of the given cache line, if the fault population of the given cache line changes, then the given cache line's error protection level is updated.
-
公开(公告)号:US20210406183A1
公开(公告)日:2021-12-30
申请号:US16927786
申请日:2020-07-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Susumu Mashimo , John Kalamatianos
IPC: G06F12/0862 , G06F12/0877 , G06F9/30 , G06K9/62
Abstract: A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.
-
公开(公告)号:US20210382718A1
公开(公告)日:2021-12-09
申请号:US16895825
申请日:2020-06-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos
Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.
-
-
-
-
-
-
-
-
-