摘要:
A method and system for maintaining a best-case demand redispatch of an instruction to allow for maximizing the time a rejected thread may execute in lookahead execution mode, while maintaining the smallest L1 cache miss penalty supported by the memory subsystem. In response to a demand miss, a load/store unit sends a fetch request to the next level cache. The cache line of the demand miss is examined to identify the critical sector. Once the critical sector is identified, a best-case data return time is determined based on the fastest time the next level cache is able to return the critical sector of the cache line. The load/store unit then sends a speculative warning to the dispatch unit to coincide with the best-case data return, wherein the speculative warning prepares the dispatch unit to resend the instruction for execution as soon as data is available to the processor core.
摘要:
In a system with multiple execution units, instructions are queued to allow efficient dispatching. One load/store unit (LSU) may have a store instruction pending to a real address and a second LSU may have a load instruction pending to the same real address. An SMT system has an atomic store quad word (SQW) instruction with a data path that is only double wide and the SQW requires two cycles to complete. The SMT system requires a method to prevent between collisions in a store reorder queue (SRQ) STQ. The real address of a load word (LW) one thread is compared to the real addresses in the SRQ of the second thread. If the SQW with a real address matching the real address of the LW has not committed both of its double words, then the LW of the second thread is rejected.
摘要:
A load instruction that accesses data cache may be off natural alignment, which causes a cache line crossing to complete the access. The illustrative embodiments provide a mechanism for loading data across multiple cache lines without the need for an accumulation register or collection point for partial data access from a first cache line while waiting for a second cache line to be accessed. Because the accesses to separate cache lines are concatenated within the vector rename register without the need for an accumulator, an off-alignment load instruction is completely pipeline-able and flushable with no cleanup consequences.
摘要:
A load instruction that accesses data cache may be off natural alignment, which causes a cache line crossing to complete the access. The illustrative embodiments provide a mechanism for loading data across multiple cache lines without the need for an accumulation register or collection point for partial data access from a first cache line while waiting for a second cache line to be accessed. Because the accesses to separate cache lines are concatenated within the vector rename register without the need for an accumulator, an off-alignment load instruction is completely pipeline-able and flushable with no cleanup consequences.