摘要:
An apparatus and method for detecting and eliminating loop-invariant instructions. For example, one embodiment of a method comprises: detecting a loop start; responsively setting the loop-invariant bit for each register entry in a register alias table; executing first N iterations of the loop and responsively clearing the loop-invariant bit of any register modified during the first N iterations of the loop; identifying one or more loop-invariant registers based on the status of the loop-invariant bit in the register alias table; identifying one or more loop-invariant instructions based on the loop-invariant registers; and propagating the identified loop-invariant instructions by storing the destination register's values in a physical register file for later reuse by other instructions.
摘要:
Mechanisms for handling multiple data errors that occur simultaneously are provided. A processing device may determine whether multiple data errors occur in memory locations that are within a range of memory locations. If the multiple memory locations are within the range of memory locations, the processing device may continue with a recovery process. If one of the multiple memory locations is outside of the range of memory locations, the processing device may halt the recovery process.
摘要:
A processor includes a core to execute instructions and a cache memory coupled to the core and having a plurality of entries. Each entry of the cache memory may include a data storage including a plurality of data storage portions, each data storage portion to store a corresponding data portion. Each entry may also include a metadata storage to store a plurality of portion modification indicators, each portion modification indicator corresponding to one of the data storage portions. Each portion modification indicator is to indicate whether the data portion stored in the corresponding data storage portion has been modified, independently of cache coherency state information of the entry. Other embodiments are described as claimed.
摘要:
An apparatus and method are described for at-retirement re-execution of faulting operations. For example, one embodiment of a processor comprises: an out-of-order engine to schedule and dispatch operations to an execution unit at least some of the operations comprising load operations to load data from a system memory and store operations to store data to the system memory; a first circuit to determine whether a current load/store operation is at retirement; a second circuit to cause logging circuitry and/or fault registers to be active when a load/store operation has been dispatched at retirement, wherein upon detection of a fault condition associated with the load/store operation, data associated with the fault is to be written to the logging circuitry and/or fault registers, the second circuit to cause the logging circuitry and/or fault registers to be inactive if the load/store operation has not be dispatched at retirement.
摘要:
A processor includes a core, a memory subsystem, a predictor module, and a memory rename module. The predictor module may include a first logic to identify a dependency between a store instruction and a load instruction, and a second logic to assign a memory renaming (MRN) register to the store instruction and the load instruction based on the identified dependency. Further, the memory rename module may include a third logic to copy, based on the assigned MRN register, information in a first logical register associated with the store instruction directly to a second logical register associated with the load instruction.
摘要:
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
摘要:
Methods and apparatus relating to a hardware move elimination and/or next page prefetching are described. In some embodiments, a logic may provide hardware move eliminations based on stored data. In an embodiment, a next page prefetcher is disclosed. Other embodiments are also described and claimed.
摘要:
In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.
摘要:
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
摘要:
In an embodiment, a processor includes at least a first core. The first core includes execution logic to execute operations, and a first event counter to determine a first event count associated with events of a first type that have occurred since a start of a first defined interval. The first core also includes a second event counter to determine a second event count associated with events of a second type that have occurred since the start of the first defined interval, and stall logic to stall execution of operations including at least first operations associated with events of the first type, until the first defined interval is expired responsive to the first event count exceeding a first combination threshold concurrently with the second event count exceeding a second combination threshold. Other embodiments are described and claimed.