摘要:
Techniques and mechanisms for determining a relative order in which a load instruction and a store instruction are to be executed. In an embodiment, a processor detects an address collision event wherein two instructions, corresponding to different respective instruction pointer values, target the same memory address. Based on the address collision event, the processor identifies respective instruction types of the two instructions as an aliasing instruction type pair. The processor further determines a count of decisions each to forego a reversal of an order of execution of instructions. Each decision represented in the count is based on instructions which are each of a different respective instruction type of the aliasing instruction type pair. In another embodiment, the processor determines, based on the count of decisions, whether a later load instruction is to be advanced in an order of instruction execution.
摘要:
Techniques and mechanisms for determining a relative order in which a load instruction and a store instruction are to be executed. In an embodiment, a processor detects an address collision event wherein two instructions, corresponding to different respective instruction pointer values, target the same memory address. Based on the address collision event, the processor identifies respective instruction types of the two instructions as an aliasing instruction type pair. The processor further determines a count of decisions each to forego a reversal of an order of execution of instructions. Each decision represented in the count is based on instructions which are each of a different respective instruction type of the aliasing instruction type pair. In another embodiment, the processor determines, based on the count of decisions, whether a later load instruction is to be advanced in an order of instruction execution.
摘要:
An example system on a chip (SoC) includes a cache, a processor, and a predictor circuit. The cache may store data. The processor may be coupled to the cache and store a first data set at a first location in the cache and receive a first request from an application to write a second data set to the cache. The predictor circuit may be coupled to the processor and determine that a second location where the second data set is to be written to in the cache is nonconsecutive to the first location, where the processor is to perform a request-for-ownership (RFO) operation for the second data set and write the second data set to the cache.
摘要:
An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.
摘要:
A method and apparatus are described for efficient memory renaming prediction using virtual registers. For example, one embodiment of an apparatus comprises: a memory execution unit (MEU) to perform store and load operations to store data to memory and load data from memory, respectively; a plurality of memory rename (MRN) registers assigned to store and load operations, each MRN register to store data associated with a store operation so that the data is available for a subsequent load operation; and at least one MRN predictor comprising a data structure to allocate virtual memory rename (VMRN) registers to each of the MRN registers, the MRN predictor to query the data structure in response to a load and/or store operation using a value identifying the MRN register assigned to the load and/or store operation, respectively, to determine a current VMRN register associated with the load and/or store operation.
摘要:
Disclosed embodiments relate to predicting load data. In one example, a processor a pipeline having stages ordered as fetch, decode, allocate, write back, and commit, a training table to store an address, predicted data, a state, and a count of instances of unchanged return data, and tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage to set the state of the first load instruction in the training table to the first state when the count reaches a first threshold.
摘要:
An example system on a chip (SoC) includes a cache, a processor, and a predictor circuit. The cache may store data. The processor may be coupled to the cache and store a first data set at a first location in the cache and receive a first request from an application to write a second data set to the cache. The predictor circuit may be coupled to the processor and determine that a second location where the second data set is to be written to in the cache is nonconsecutive to the first location, where the processor is to perform a request-for-ownership (RFO) operation for the second data set and write the second data set to the cache.
摘要:
A processor includes a core, a memory subsystem, a predictor module, and a memory rename module. The predictor module may include a first logic to identify a dependency between a store instruction and a load instruction, and a second logic to assign a memory renaming (MRN) register to the store instruction and the load instruction based on the identified dependency. Further, the memory rename module may include a third logic to copy, based on the assigned MRN register, information in a first logical register associated with the store instruction directly to a second logical register associated with the load instruction.