摘要:
A method and device for determining an attribute associated with a locked load instruction and selecting a lock protocol based upon the attribute of the locked load instruction. Also disclosed is a method for concurrently executing the respective lock sequences associated with multiple threads of a processing device.
摘要:
Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.
摘要:
Responsive to receiving a write request for a cache line from an input/output device, a caching agent of a first processor determines that the cache line is managed by a home agent of a second processor. The caching agent sends an ownership request for the cache line to the second processor. A home agent of the second processor receives the ownership request, generates an entry in a directory cache for the cache line, the entry identifying the remote caching agent as having ownership of the cache line, and grants ownership of the cache line to the remote caching agent. Responsive to receiving the grant of ownership for the cache line from the home agent an input/output controller of the first processor adds an entry for the cache line to an input/output write cache, the entry comprising a first indicator that the cache line is managed by the home agent of the second processor.
摘要:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
摘要:
Methods and apparatus relating to allocation and/or write policy for a glueless area-efficient directory cache for hotly contested cache lines are described. In one embodiment, a directory cache stores data corresponding to a caching status of a cache line. The caching status of the cache line is stored for each of a plurality of caching agents in the system. An write-on-allocate policy is used for the directory cache by using a special state (e.g., snoop-all state) that indicates one or more snoops are to be broadcasted to all agents in the system. Other embodiments are also disclosed.
摘要:
In a cache coherency protocol a re-snoop may be utilized to resolve a data request conflict condition. The re-snoop may avoid a conflict resolution phase, which may reduce system inefficiencies.
摘要:
A processing core using a lock scoreboard mechanism is provided. The lock scoreboard is adapted to manage a load-lock instruction. The load-lock scoreboard includes a plurality of scoreboard entries representing different conditions that must be met before the load-lock instruction can be retired. During execution of the load-lock instruction retirement conditions are speculatively performed, and the scoreboard is updated and checked accordingly. If the scoreboard indicates that one or more retirement conditions are not met, the load-lock instruction is replayed. Otherwise, the load-lock instruction is permitted to retire. Scoreboard management functions routinely update scoreboard contents as retirement conditions are cleared. This enables rapid retirement of load-lock operations.
摘要:
An apparatus and method for reducing or eliminating writeback operations. For example, one embodiment of a method comprises: detecting a first operation associated with a cache line at a first requestor cache; detecting that the cache line exists in a first cache in a modified (M) state; forwarding the cache line from the first cache to the first requestor cache and storing the cache line in the first requestor cache in a second modified (M′) state; detecting a second operation associated with the cache line at a second requestor; responsively forwarding the cache line from the first requestor cache to the second requestor cache and storing the cache line in the second requestor cache in an owned (O) state if the cache line has not been modified in the first requestor cache; and setting the cache line to a shared (S) state in the first requestor cache.
摘要:
A request is received that is to reference a first agent and to request a particular line of memory to be cached in an exclusive state. A snoop request is sent intended for one or more other agents. A snoop response is received that is to reference a second agent, the snoop response to include a writeback to memory of a modified cache line that is to correspond to the particular line of memory. A complete is sent to be addressed to the first agent, wherein the complete is to include data of the particular line of memory based on the writeback.
摘要:
An apparatus and method are described for performing partial memory mirroring operations. For example, one embodiment of a processor comprises: a processor core for generating a read or write transaction having a system memory address; a home agent identified to service the read or write transaction based on the system memory address; one or more target address decoders (TADs) associated with the home agent to determine whether the system memory address is within a mirrored memory region or a non-mirrored memory region, wherein: if the system memory address is within a mirrored memory region, then the one or more TADs identifying multiple mirrored memory channels for the read or write transaction; and if the system memory address is not within a mirrored memory region, then the one or more TADs identifying a single memory channel for the read or write transaction.