摘要:
A system supporting producer-consumer pre-fetch communications includes a first processor, wherein the first processor is a producer node, and a second processor, wherein the second processor is a consumer node. The system further includes a data subscribe mechanism for performing a data subscribe operation at the consumer node, wherein the data subscribe operation records that a memory address is subscribed at the consumer node, a data publish mechanism for performing a data publish operation at the producer nod; wherein the data publish operation sends data of the memory address from the producer node to the consumer node if the memory address is subscribed at the consumer node, and a communication network coupled to the producer node and the consumer node for enabling communicating between the producer node and the consumer node.
摘要:
In shared-memory multiprocessor systems, cache interventions from different sourcing caches can result in different cache intervention costs. With location-aware cache coherence, when a cache receives a data request, the cache can determine whether sourcing the data from the cache will result in less cache intervention cost than sourcing the data from another cache. The decision can be made based on appropriate information maintained in the cache or collected from snoop responses from other caches. If the requested data is found in more than one cache, the cache that has or likely has the lowest cache intervention cost is generally responsible for supplying the data. The intervention cost can be measured by performance metrics that include, but are not limited to, communication latency, bandwidth consumption, load balance, and power consumption.
摘要:
A system and method for latency-aware thread scheduling in non-uniform cache architecture are provided. Instructions may be provided to the hardware specifying in which banks to store data. Information as to which banks store which data may also be provided, for example, by the hardware. This information may be used to schedule threads on one or more cores. A selected bank in cache memory may be reserved strictly for selected data.
摘要:
There is disclosed a method and apparatus for handling transaction buffer overflow in a multi-processor system as well as a transaction memory system in a multi-processor system. The method comprises the steps of: when overflow occurs in a transaction buffer of one processor, disabling peer processors from entering transactions, and waiting for any processor having a current transaction to complete its current transaction; re-executing the transaction resulting in the transaction buffer overflow without using the transaction buffer; and when the transaction execution is completed, enabling the peer processors for entering transactions.
摘要:
A method for reconfiguring a cache memory is provided. The method in one aspect may include analyzing one or more characteristics of an execution entity accessing a cache memory and reconfiguring the cache based on the one or more characteristics analyzed. Examples of analyzed characteristic may include but are not limited to data structure used by the execution entity, expected reference pattern of the execution entity, type of an execution entity, heat and power consumption of an execution entity, etc. Examples of cache attributes that may be reconfigured may include but are not limited to associativity of the cache memory, amount of the cache memory available to store data, coherence granularity of the cache memory, line size of the cache memory, etc.
摘要:
We present a triangle ordering mechanism that maintains triangle ordering of coherence messages in SMP systems. If cache A sends a multicast message to caches B and C, and if cache B sends a message to cache C after receiving and processing the multicast message from cache A, the triangle ordering mechanism ensures that cache C processes the multicast message from cache A before processing the message from cache B. The triangle ordering mechanism enables efficient snoopy cache coherence in SMP systems in which caches communicate with each other via message-passing networks. A modified version of the triangle ordering mechanism categorizes coherence messages into non-overlapping sequencing classes, and ensures triangle ordering for coherence messages in the same sequencing class. The modified triangle ordering mechanism can significantly reduce potential performance degradation due to false waiting.
摘要:
A method and system for efficient context switching are provided. An execution entity that is to be context switched out is allowed to continue executing for a predetermined period of time before being context switched out. During the predetermined period of time in which the execution entity continues to execute, the hardware or an operating system tracks and records its footprint such as the addresses and page and segment table entries and the like accessed by the continued execution. When the execution entity is being context switched back in, its page and segment table and cache states are reloaded for use in its immediate execution.
摘要:
A method for reconfiguring a cache memory is provided. The method in one aspect may include analyzing one or more characteristics of an execution entity accessing a cache memory and reconfiguring the cache based on the one or more characteristics analyzed. Examples of analyzed characteristic may include but are not limited to data structure used by the execution entity, expected reference pattern of the execution entity, type of an execution entity, heat and power consumption of an execution entity, etc. Examples of cache attributes that may be reconfigured may include but are not limited to associativity of the cache memory, amount of the cache memory available to store data, coherence granularity of the cache memory, line size of the cache memory, etc.
摘要:
With scope-based cache coherence, a cache can maintain scope information for a memory address. The scope information specifies caches in which data of the address is potentially cached, but not necessarily caches in which data of the address is actually cached. Appropriate scope information can be used as snoop filters to reduce unnecessary coherence messages and snoop operations in SMP systems. If a cache maintains scope information of an address, it can potentially avoid sending cache requests to caches outside the scope in case of a cache miss on the address. Scope information can be adjusted dynamically via a scope calibration operation to reflect changing data access patterns. A calibration prediction mechanism can be employed to predict when a scope calibration needs to be invoked.
摘要:
Systems and methods for cache replacement monitoring (CRM) are provided. The system includes a monitored cache comprising a monitored cache line set, the monitored cache line set comprising at least one cache line capable of holding data of a monitored address; and a CRM mechanism operatively associated with the monitored cache. The CRM mechanism collects CRM information for the monitored address. The method includes the steps of collecting CRM information for a monitored address in a monitored cache; and recording the CRM information for the monitored address, when at least one of (1) the monitored address is cached in the monitored cache, (2) the monitored address is replaced in the monitored cache, (3) any cache line in a cache line set corresponding to the monitored address is cached in the monitored cache, and (4) any cache line in a cache line set corresponding to the monitored address is replaced in the monitored cache.