摘要:
A request is received that is to reference a first agent and to request a particular line of memory to be cached in an exclusive state. A snoop request is sent intended for one or more other agents. A snoop response is received that is to reference a second agent, the snoop response to include a writeback to memory of a modified cache line that is to correspond to the particular line of memory. A complete is sent to be addressed to the first agent, wherein the complete is to include data of the particular line of memory based on the writeback.
摘要:
A physical layer (PHY) is coupled to a serial, differential link that is to include a number of lanes. The PHY includes a transmitter and a receiver to be coupled to each lane of the number of lanes. The transmitter coupled to each lane is configured to embed a clock with data to be transmitted over the lane, and the PHY periodically issues a blocking link state (BLS) request to cause an agent to enter a BLS to hold off link layer flit transmission for a duration. The PHY utilizes the serial, differential link during the duration for a PHY associated task selected from a group including an in-band reset, an entry into low power state, and an entry into partial width state
摘要:
An apparatus and method for reducing or eliminating writeback operations. For example, one embodiment of a method comprises: detecting a first operation associated with a cache line at a first requestor cache; detecting that the cache line exists in a first cache in a modified (M) state; forwarding the cache line from the first cache to the first requestor cache and storing the cache line in the first requestor cache in a second modified (M′) state; detecting a second operation associated with the cache line at a second requestor; responsively forwarding the cache line from the first requestor cache to the second requestor cache and storing the cache line in the second requestor cache in an owned (O) state if the cache line has not been modified in the first requestor cache; and setting the cache line to a shared (S) state in the first requestor cache.
摘要:
Methods and apparatus relating to allocation and/or write policy for a glueless area-efficient directory cache for hotly contested cache lines are described. In one embodiment, a directory cache stores data corresponding to a caching status of a cache line. The caching status of the cache line is stored for each of a plurality of caching agents in the system. An write-on-allocate policy is used for the directory cache by using a special state (e.g., snoop-all state) that indicates one or more snoops are to be broadcasted to all agents in the system. Other embodiments are also disclosed.
摘要:
Methods and apparatus relating to allocation and/or write policy for a glueless area-efficient directory cache for hotly contested cache lines are described. In one embodiment, a directory cache stores data corresponding to a caching status of a cache line. The caching status of the cache line is stored for each of a plurality of caching agents in the system. An write-on-allocate policy is used for the directory cache by using a special state (e.g., snoop-all state) that indicates one or more snoops are to be broadcasted to all agents in the system. Other embodiments are also disclosed.
摘要:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
摘要:
In a cache coherency protocol a re-snoop may be utilized to resolve a data request conflict condition. The re-snoop may avoid a conflict resolution phase, which may reduce system inefficiencies.
摘要:
In a cache coherency protocol a re-snoop may be utilized to resolve a data request conflict condition. The re-snoop may avoid a conflict resolution phase, which may reduce system inefficiencies.
摘要:
Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.
摘要:
A method and device for determining an attribute associated with a locked load instruction and selecting a lock protocol based upon the attribute of the locked load instruction. Also disclosed is a method for concurrently executing the respective lock sequences associated with multiple threads of a processing device.