Device, method and system to predict an address collision by a load and a store

    公开(公告)号:US12086591B2

    公开(公告)日:2024-09-10

    申请号:US17214698

    申请日:2021-03-26

    申请人: Intel Corporation

    IPC分类号: G06F9/22 G06F9/30 G06F9/38

    CPC分类号: G06F9/30043 G06F9/3856

    摘要: Techniques and mechanisms for determining a relative order in which a load instruction and a store instruction are to be executed. In an embodiment, a processor detects an address collision event wherein two instructions, corresponding to different respective instruction pointer values, target the same memory address. Based on the address collision event, the processor identifies respective instruction types of the two instructions as an aliasing instruction type pair. The processor further determines a count of decisions each to forego a reversal of an order of execution of instructions. Each decision represented in the count is based on instructions which are each of a different respective instruction type of the aliasing instruction type pair. In another embodiment, the processor determines, based on the count of decisions, whether a later load instruction is to be advanced in an order of instruction execution.

    Apparatus and method for hardware-based memoization of function calls to reduce instruction execution

    公开(公告)号:US12020033B2

    公开(公告)日:2024-06-25

    申请号:US17133899

    申请日:2020-12-24

    申请人: Intel Corporation

    IPC分类号: G06F9/38 G06F9/22

    摘要: Apparatus and method for memorizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memorization based on retiring micro-operations (uops) from a processing pipeline; memorization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memorization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memorization data structure.

    Data relocation for inline metadata

    公开(公告)号:US11972126B2

    公开(公告)日:2024-04-30

    申请号:US17472272

    申请日:2021-09-10

    申请人: Intel Corporation

    摘要: Technologies disclosed herein provide one example of a system that includes processor circuitry to be communicatively coupled to a memory circuitry. The processor circuitry is to receive a memory access request corresponding to an application for access to an address range in a memory allocation of the memory circuitry and to locate a metadata region within the memory allocation. The processor circuitry is also to, in response to a determination that the address range includes at least a portion of the metadata region, obtain first metadata stored in the metadata region, use the first metadata to determine an alternate memory address in a relocation region, and read, at the alternate memory address, displaced data from the portion of the metadata region included in the address range of the memory allocation. The address range includes one or more bytes of an expected allocation region of the memory allocation.

    SPECULATIVE DECOMPRESSION WITHIN PROCESSOR CORE CACHES

    公开(公告)号:US20220197643A1

    公开(公告)日:2022-06-23

    申请号:US17133618

    申请日:2020-12-23

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F12/0875

    摘要: Methods and apparatus relating to speculative decompression within processor core caches are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into a plurality of cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the plurality of cachelines of the cache of the processor core in response to the second micro operation. The decompression instruction causes the DE circuitry to perform an out-of-order decompression of the plurality of cachelines. Other embodiments are also disclosed and claimed.

    Automatic predication of hard-to-predict convergent branches

    公开(公告)号:US10754655B2

    公开(公告)日:2020-08-25

    申请号:US16021838

    申请日:2018-06-28

    申请人: Intel Corporation

    摘要: A processing device includes a branch IP table and branch predication circuitry coupled to the branch IP table. The branch predication circuitry to: determine a dynamic convergence point in a conditional branch of set of instructions; store the dynamic convergence point in the branch IP table; fetch a first and second speculative path of the conditional branch; while determining which of the first speculative path and the second speculative path is a taken path of the conditional branch and determining whether a dynamic convergence point is fetched corresponding to the stored dynamic convergence point, stall scheduling of instructions of the first speculative path and the second speculative path; and in response to determining that one of the first speculative path and the second speculative path is the taken path and the fetched dynamic convergence point corresponds to the stored convergence point, resume scheduling of the instructions of the taken path.

    MEMORY-EFFICIENT LAST LEVEL CACHE ARCHITECTURE

    公开(公告)号:US20180203799A1

    公开(公告)日:2018-07-19

    申请号:US15408731

    申请日:2017-01-18

    申请人: Intel Corporation

    IPC分类号: G06F12/0811

    摘要: A memory-efficient last level cache (LLC) architecture is described. A processor implementing a LLC architecture may include a processor core, a last level cache (LLC) operatively coupled to the processor core, and a cache controller operatively coupled to the LLC. The cache controller is to monitor a bandwidth demand of a channel between the processor core and a dynamic random-access memory (DRAM) device associated with the LLC. The cache controller is further to perform a first defined number of consecutive reads from the DRAM device when the bandwidth demand exceeds a first threshold value and perform a first defined number of consecutive writes of modified lines from the LLC to the DRAM device when the bandwidth demand exceeds the first threshold value.

    DEVICE, METHOD AND SYSTEM TO PROVIDE A PREDICTED VALUE WITH A SEQUENCE OF MICRO-OPERATIONS

    公开(公告)号:US20230195465A1

    公开(公告)日:2023-06-22

    申请号:US17558368

    申请日:2021-12-21

    申请人: Intel Corporation

    IPC分类号: G06F9/38 G06F9/30

    摘要: Techniques and mechanisms for efficiently making value prediction information available for use by in a processor. In an embodiment, the instruction execution is to include a loading of some data to a first location (e.g., a first register). A decoder of the processor accesses reference information which indicates that the execution is to comprise multiple micro-operations (μops) including a LoadCheck μop and a Move μop. The LoadCheck μop loads a first value to the first location, and checks whether the loaded first value is the same as a previously-determined second value which represents a prediction of what the first value would be. The Move μop moves the second value to the first location. In another embodiment, the Move μop is scheduled for execution out-of-order with respect to the LoadCheck μop, resulting in an early availability of the second value for access in a register file by another μop.