Abstract:
A method and apparatus for using result-speculative data under run-ahead speculative execution is disclosed. In one embodiment, the uncommitted target data from instructions being run-ahead executed may be saved into an advance data table. This advance data table may be indexed by the lines in the instruction buffer containing the instructions for run-ahead execution. When the instructions are re-executed subsequent to the run-ahead execution, valid target data may be retrieved from the advance data table and supplied as part of a zero-clock bypass to support parallel re-execution. This may achieve parallel execution of dependent instructions. In other embodiments, the advance data table may be content-addressable-memory searchable on target registers and supply target data to general speculative execution.
Abstract:
A system may include a first portion and a processor. The first portion may include a first thermal sensor to provide first thermal data. The processor may include a core, a core thermal sensor to provide core thermal data, and an electrical power sensor to provide electrical data. The processor may also include a power management unit to selectively throttle the core based on the first thermal data, the core thermal data, and the electrical data.
Abstract:
The present invention provides a mechanism for supporting high bandwidth instruction fetching in a multi-threaded processor. A multi-threaded processor includes an instruction cache (I-cache) and a temporary instruction cache (TIC). In response to an instruction pointer (IP) of a first thread hitting in the I-cache, a first block of instructions for the thread is provided to an instruction buffer and a second block of instructions for the thread are provided to the TIC. On a subsequent clock interval, the second block of instructions is provided to the instruction buffer, and first and second blocks of instructions from a second thread are loaded into a second instruction buffer and the TIC, respectively.
Abstract:
A method and apparatus for improving dispersal performance of instruction threads is described. In one embodiment, the dispersal logic determines whether the instructions supplied to it include any NOP instructions. When a NOP instruction is detected, the dispersal logic places the NOP into a no-op port for execution. All other instructions are distributed to the proper execution pipes in a normal manner. Because the NOP instructions do not use the execution resources of other instructions, all instruction threads can be executed in one cycle.
Abstract:
Systems and techniques for virtual device sharing. A failure of one of a plurality of memory devices corresponding to a first rank in a memory system is detected. The memory system has a plurality of ranks, each rank having a plurality of memory devices used to store a cache line. A portion of the cache line corresponding to the failed memory device is stored in a memory device in a second rank in the memory system and the remaining portion of the cache line in the first rank of the memory system.
Abstract:
The apparatus and method described herein are for handling shared memory accesses between multiple processors utilizing lock-free synchronization through transactional-execution. A transaction demarcated in software is speculatively executed. During execution invalidating remote accesses/requests to addresses loaded from and to be written to shared memory are tracked by a transaction buffer. If an invalidating access is encountered, the transaction is re-executed. After a pre-determined number of times re-executing the transaction, the transaction may be re-executed non-speculatively with locks/semaphores.
Abstract:
The apparatus and method described herein are for handling shared memory accesses between multiple processors utilizing lock-free synchronization through transactional-execution. A transaction demarcated in software is speculatively executed. During execution invalidating remote accesses/requests to addresses loaded from and to be written to shared memory are tracked by a transaction buffer. If an invalidating access is encountered, the transaction is re-executed. After a pre-determined number of times re-executing the transaction, the transaction may be re-executed non-speculatively with locks/semaphores.
Abstract:
In a multi-threaded processor, a long latency data dependent thread is flushed from the execution pipelines. Once the stalled thread is flushed, the non-stalling threads in the pipeline can continue their execution. Several resources are used to reduce this unwanted impact of stalls on the non-stalling threads. Also, these resources ensure that the earlier stalled thread, now flushed, is re-executed when the data dependency is resolved.
Abstract:
A system and method for a simultaneous multithreaded processor that reduces the number of hardware components necessary as well as the complexity of design over current systems is disclosed. As opposed to requiring individual storage elements for saving instruction pointer information for each re-steer logic component within a processor pipeline, the present invention allows for instruction pointer information of an inactive thread to be stored in a single, ‘inactive thread’ storage element until the thread becomes active again.
Abstract:
An apparatus and method for fairly accessing a shared cache with multiple resources, such as multiple cores, multiple threads, or both are herein described. A resource within a microprocessor sharing access to a cache is assigned a static portion of the cache and a dynamic portion. The resource is blocked from victimizing static portions assigned to other resources, yet, allowed to victimize the static portion assigned to the resource and the dynamically shared portion. If the resource does not access the cache enough times over a period of time, the static portion assigned to the resource is reassigned to the dynamically shared portion.