Abstract:
An multi-threading processor is provided. The multi-threading processor includes a first instruction fetch unit to receive a first thread and a second instruction fetch unit to receive a second thread. A multi-thread scheduler coupled to the instruction fetch units and a execution unit. The multi-thread scheduler determines the width of the execution unit and the execution unit executes the threads accordingly.
Abstract:
In a multi-threaded processor, a long latency data dependent thread is flushed from the execution pipelines. Once the stalled thread is flushed, the non-stalling threads in the pipeline can continue their execution. Several resources are used to reduce this unwanted impact of stalls on the non-stalling threads. Also, these resources ensure that the earlier stalled thread, now flushed, is re-executed when the data dependency is resolved.
Abstract:
A system and method for a simultaneous multithreaded processor that reduces the number of hardware components necessary as well as the complexity of design over current systems is disclosed. As opposed to requiring individual storage elements for saving instruction pointer information for each re-steer logic component within a processor pipeline, the present invention allows for instruction pointer information of an inactive thread to be stored in a single, ‘inactive thread’ storage element until the thread becomes active again.
Abstract:
An apparatus and method for fairly accessing a shared cache with multiple resources, such as multiple cores, multiple threads, or both are herein described. A resource within a microprocessor sharing access to a cache is assigned a static portion of the cache and a dynamic portion. The resource is blocked from victimizing static portions assigned to other resources, yet, allowed to victimize the static portion assigned to the resource and the dynamically shared portion. If the resource does not access the cache enough times over a period of time, the static portion assigned to the resource is reassigned to the dynamically shared portion.
Abstract:
A method and apparatus for supporting temporal data and non-temporal data memory accesses in a cache is disclosed. In one embodiment, a specially selected way in a set is generally used for non-temporal data memory accesses. A non-temporal flag may be associated with this selected way. In one embodiment, cache lines from memory accesses including a non-temporal hint may be generally placed into the selected way, and the non-temporal flag then set. When a temporal data cache line is to be loaded into a set, it may overrule the normal replacement method when the non-temporal flag is set, and be loaded into that selected way.
Abstract:
A method and apparatus for using result-speculative data under run-ahead speculative execution is disclosed. In one embodiment, the uncommitted target data from instructions being run-ahead executed may be saved into an advance data table. This advance data table may be indexed by the lines in the instruction buffer containing the instructions for run-ahead execution. When the instructions are re-executed subsequent to the run-ahead execution, valid target data may be retrieved from the advance data table and supplied as part of a zero-clock bypass to support parallel re-execution. This may achieve parallel execution of dependent instructions. In other embodiments, the advance data table may be content-addressable-memory searchable on target registers and supply target data to general speculative execution.
Abstract:
A method and system for allowing a multi-threaded processor to share pages across different threads in a pre-validated cache using a translation look-aside buffer is disclosed. The multi-threaded processor searches a translation look-aside buffer in an attempt to match a virtual memory address. If no matching valid virtual memory address is found, a new translation is retrieved and the translation look-aside buffer is searched for a matching physical memory address. If a matching physical memory address is found, the old translation is overwritten with a new translation. The multi-threaded processor may execute switch on event multi-threading or simultaneous multi-threading. If simultaneous multi-threading is executed, then access rights for each thread is associated with the translation.
Abstract:
Methods and apparatus relating to improving the value of F-state by increasing a local caching agent's data forwarding are described. In one embodiment, the opportunity for forwarding from a local caching agent is improved by allowing the local caching agent to keep an F-state copy of the line while sending an S-state copy to the requestor (e.g., in response to a non-ownership read operation). Other embodiments are also disclosed.
Abstract:
An apparatus and method are described for store durability and ordering in a persistent memory architecture. For example, one embodiment of a method comprises: performing at least one store operation to one or more addresses identifying at least one persistent memory device, the store operations causing one or more memory controllers to store data in the at least one persistent memory device; sending a request message to the one or more memory controllers instructing the memory controllers to confirm that the store operations are successfully committed to the at least one persistent memory device; ensuring at the one or more memory controllers that at least all pending store operations received at the time of the request message will be committed to the persistent memory device; and sending a response message from the one or more memory controllers indicating that the store operations are successfully committed to the persistent memory device.
Abstract:
Methods and apparatus relating to directory based coherency to improve input/output write bandwidth in scalable systems are described. In one embodiment, a first agent receives a request to write data from a second agent via a link and logic causes the first agent to write the directory state to an Input/Output Directory Cache (IODC) of the first agent. Additionally, the logic causes the second agent to send data from a modified state to an exclusive state using write back to the first agent, while allowing the data to remain cached exclusively in the second agent and also enabling the deallocation of the IODC entry in the first agent. Other embodiments are also disclosed.