Abstract:
A system for validating branch predictions for clusters of branch instructions includes an address validation module and a condition validation module. The address validation module determines target addresses for the branches in the cluster. One of the determined target addresses is selected, using predicted branch directions. The selected target address is compared with a predicted target address, and resolved branch directions are compared with predicted branch directions. A misprediction is indicated if either comparison fails.
Abstract:
Methods and apparatus relating to improving the value of F-state by increasing a local caching agent's data forwarding are described. In one embodiment, the opportunity for forwarding from a local caching agent is improved by allowing the local caching agent to keep an F-state copy of the line while sending an S-state copy to the requestor (e.g., in response to a non-ownership read operation). Other embodiments are also disclosed.
Abstract:
An apparatus and method for reducing or eliminating writeback operations. For example, one embodiment of a method comprises: detecting a first operation associated with a cache line at a first requestor cache; detecting that the cache line exists in a first cache in a modified (M) state; forwarding the cache line from the first cache to the first requestor cache and storing the cache line in the first requestor cache in a second modified (M′) state; detecting a second operation associated with the cache line at a second requestor; responsively forwarding the cache line from the first requestor cache to the second requestor cache and storing the cache line in the second requestor cache in an owned (O) state if the cache line has not been modified in the first requestor cache; and setting the cache line to a shared (S) state in the first requestor cache.
Abstract:
In one embodiment, a method includes receiving a read request from a first caching agent, determining whether a directory entry associated with the memory location indicates that the information is not present in a remote caching agent, and if so, transmitting the information from the memory location to the first caching agent before snoop processing with respect to the read request is completed. Other embodiments are described and claimed.
Abstract:
In one embodiment, the present invention includes a method of assigning a location within a shared variable for each of multiple threads and writing a value to a corresponding location to indicate that the corresponding thread has reached a barrier. In such manner, when all the threads have reached the barrier, synchronization is established. In some embodiments, the shared variable may be stored in a cache accessible by the multiple threads. Other embodiments are described and claimed.
Abstract:
Methods and apparatus relating to dynamically routing data responses directly to a requesting processor core are described. In one embodiment, data returned in response to a data request is to be directly transmitted to a requesting agent based on information stored in a route back table. Other embodiments are also disclosed.
Abstract:
In one embodiment, the present invention includes a method of assigning a location within a shared variable for each of multiple threads and writing a value to a corresponding location to indicate that the corresponding thread has reached a barrier. In such manner, when all the threads have reached the barrier, synchronization is established. In some embodiments, the shared variable may be stored in a cache accessible by the multiple threads. Other embodiments are described and claimed.
Abstract:
An apparatus and method for fairly accessing a shared cache with multiple resources, such as multiple cores, multiple threads, or both are herein described. A resource within a microprocessor sharing access to a cache is assigned a static portion of the cache and a dynamic portion. The resource is blocked from victimizing static portions assigned to other resources, yet, allowed to victimize the static portion assigned to the resource and the dynamically shared portion. If the resource does not access the cache enough times over a period of time, the static portion assigned to the resource is reassigned to the dynamically shared portion.
Abstract:
Methods and apparatus to efficiently process non-ownership load requests hitting modified line (M-line) in cache of a different processor are described. In one embodiment, a first agent changes the state of a first data and forwards it to a second, requesting agent who stores the first data in an alternative modified state. Other embodiments are also described.
Abstract:
A method and system for allowing a multi-threaded processor to share pages across different threads in a pre-validated cache using a translation look-aside buffer is disclosed. The multi-threaded processor searches a translation look-aside buffer in an attempt to match a virtual memory address. If no matching valid virtual memory address is found, a new translation is retrieved and the translation look-aside buffer is searched for a matching physical memory address. If a matching physical memory address is found, the old translation is overwritten with a new translation. The multi-threaded processor may execute switch on event multi-threading or simultaneous multi-threading. If simultaneous multi-threading is executed, then access rights for each thread is associated with the translation.