Preserving memory ordering between offloaded instructions and non-offloaded instructions

    公开(公告)号:US11625249B2

    公开(公告)日:2023-04-11

    申请号:US17137140

    申请日:2020-12-29

    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

    Flexible dictionary sharing for compressed caches

    公开(公告)号:US11586555B2

    公开(公告)日:2023-02-21

    申请号:US17231957

    申请日:2021-04-15

    Abstract: Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.

    DISPATCH BANDWIDTH OF MEMORY-CENTRIC REQUESTS BY BYPASSING STORAGE ARRAY ADDRESS CHECKING

    公开(公告)号:US20230030679A1

    公开(公告)日:2023-02-02

    申请号:US17386115

    申请日:2021-07-27

    Abstract: A technical solution to the technical problem of how to improve dispatch throughput for memory-centric commands bypasses address checking for certain memory-centric commands. Implementations include using an Address Check Bypass (ACB) bit to specify whether address checking should be performed for a memory-centric command. ACB bit values are specified in memory-centric instructions, automatically specified by a process, such as a compiler, or by host hardware, such as dispatch hardware, based upon whether a memory-centric command explicitly references memory. Implementations include bypassing, i.e., not performing, address checking for memory-centric commands that do not access memory and also for memory-centric commands that do access memory, but that have the same physical address as a prior memory-centric command that explicitly accessed memory to ensure that any data in caches was flushed to memory and/or invalidated.

    Per-instruction energy debugging using instruction sampling hardware

    公开(公告)号:US11556162B2

    公开(公告)日:2023-01-17

    申请号:US15923153

    申请日:2018-03-16

    Abstract: A processor utilizes instruction based sampling to generate sampling data sampled on a per instruction basis during execution of an instruction. The sampling data indicates what processor hardware was used due to the execution of the instruction. Software receives the sampling data and generates an estimate of energy used by the instruction based on the sampling data. The sampling data may include microarchitectural events and the energy estimate utilizes a base energy amount corresponding to the instruction executed along with energy amounts corresponding to the microarchitectural events in the sampling data. The sampling data may include switching events associated with hardware blocks that switched due to execution of the instruction and the energy estimate for the instruction is based on the switching events and capacitance estimates associated with the hardware blocks.

    Compressing micro-operations in scheduler entries in a processor

    公开(公告)号:US11513802B2

    公开(公告)日:2022-11-29

    申请号:US17033883

    申请日:2020-09-27

    Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.

    Multi-class multi-label classification using clustered singular decision trees for hardware adaptation

    公开(公告)号:US11455252B2

    公开(公告)日:2022-09-27

    申请号:US16454027

    申请日:2019-06-26

    Abstract: Techniques for generating a model for predicting when different hybrid prefetcher configurations should be used are disclosed. Techniques for using the model to predict when different hybrid prefetcher configurations should be used are also disclosed. The techniques for generating the model include obtaining a set of input data, and generating trees based on the training data. Each tree is associated with a different hybrid prefetcher configuration and the trees output certainty scores for the associated hybrid prefetcher configuration based on hardware feature measurements. To decide on a hybrid prefetcher configuration to use, a prefetcher traverses multiple trees to obtain certainty scores for different hybrid prefetcher configurations and identifies a hybrid prefetcher configuration to used based on a comparison of the certainty scores.

    MASKED FAULT DETECTION FOR RELIABLE LOW VOLTAGE CACHE OPERATION

    公开(公告)号:US20220103191A1

    公开(公告)日:2022-03-31

    申请号:US17125145

    申请日:2020-12-17

    Abstract: Systems, apparatuses, and methods for implementing masked fault detection for reliable low voltage cache operation are disclosed. A processor includes a cache that can operate at a relatively low voltage level to conserve power. However, at low voltage levels, the cache is more likely to suffer from bit errors. To mitigate the bit errors occurring in cache lines at low voltage levels, the cache employs a strategy to uncover masked faults during runtime accesses to data by actual software applications. For example, on the first read of a given cache line, the data of the given cache line is inverted and written back to the same data array entry. Also, the error correction bits are regenerated for the inverted data. On a second read of the given cache line, if the fault population of the given cache line changes, then the given cache line's error protection level is updated.

    Controlling Prediction Functional Blocks Used by a Branch Predictor in a Processor

    公开(公告)号:US20210382718A1

    公开(公告)日:2021-12-09

    申请号:US16895825

    申请日:2020-06-08

    Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.

Patent Agency Ranking