Abstract:
A physical layer (PHY) is coupled to a serial, differential link that is to include a number of lanes. The PHY includes a transmitter and a receiver to be coupled to each lane of the number of lanes. The transmitter coupled to each lane is configured to embed a clock with data to be transmitted over the lane, and the PHY periodically issues a blocking link state (BLS) request to cause an agent to enter a BLS to hold off link layer flit transmission for a duration. The PHY utilizes the serial, differential link during the duration for a PHY associated task selected from a group including an in-band reset, an entry into low power state, and an entry into partial width state.
Abstract:
Method and apparatus for per-agent control and quality of service of shared resources in a chip multiprocessor platform is described herein. One embodiment of a system includes: a plurality of core and non-core requestors of shared resources, the shared resources to be provided by one or more resource providers, each of the plurality of core and non-core requestors to be associated with a resource-monitoring tag and a resource-control tag; a mapping table to store the resource monitoring and control tags associated with each non-core requestor; and a tagging circuitry to receive a resource request sent from a non-core requestor to a resource provider, the tagging circuitry to responsively modify the resource request to include the resource-monitoring and resource-control tags associated with the non-core requestor in accordance to the mapping table and send the modified resource request to the resource provider.
Abstract:
Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.
Abstract:
A method and apparatus for improving snooping performance is disclosed. In one embodiment, one or more content addressable matches are used to determine where and when an address conflict occurs. Depending upon the timing, a read request or a snoop request may be set for retry. In another embodiment, an age order matrix may be used to determine when several core snoop requests may be issued during a same time period, so that the snoops may be processed during this time period.
Abstract:
Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
Abstract:
A shared mesh comprises a mesh station. The mesh station is used to couple to at least a first core component and a second core component. The mesh station includes a logic unit. The mesh station is shared by at least the first core component and the second core component. A memory is coupled to the mesh station.
Abstract:
A speculative read request is received from a host device over a buffered memory access link for data associated with a particular address. A read request is sent for the data to a memory device. The data is received from the memory device in response to the read request and the received data is sent to the host device as a response to a demand read request received subsequent to the speculative read request.
Abstract:
An apparatus and method are described for store durability and ordering in a persistent memory architecture. For example, one embodiment of a method comprises: performing at least one store operation to one or more addresses identifying at least one persistent memory device, the store operations causing one or more memory controllers to store data in the at least one persistent memory device; sending a request message to the one or more memory controllers instructing the memory controllers to confirm that the store operations are successfully committed to the at least one persistent memory device; ensuring at the one or more memory controllers that at least all pending store operations received at the time of the request message will be committed to the persistent memory device; and sending a response message from the one or more memory controllers indicating that the store operations are successfully committed to the persistent memory device.
Abstract:
A processor may include a memory controller to interface with a system memory having a near memory and a far memory. The processor may include logic circuitry to cause memory controller to determine whether a write request is generated remotely or locally, and when the write request is generated remotely to instruct the memory controller to perform a read of near memory before performing a write, when the write request is generated locally and a cache line targeted by the write request is in the inclusive state to instruct the memory controller to perform the write without performing a read of near memory, and when the write request is generated locally and the cache line targeted by the write request is in the non-inclusive state to instruct the memory controller to read near memory before performing the write.
Abstract:
In accordance with embodiments disclosed herein, there is provided systems and methods for providing a virtual shared cache mechanism. A processing device includes a plurality of clusters allocated into a virtual private shared cache. Each of the clusters includes a plurality of cores and a plurality of cache slices co-located within the plurality of cores. The processing device also includes a virtual shared cache including the plurality of clusters such that the cache data in the plurality of cache slices is shared among the plurality of clusters.