Abstract:
In one embodiment, a system includes an (input/output) I/O domain and a compute domain. The I/O domain includes an I/O agent and a I/O domain caching agent. The compute domain includes a compute domain caching agent and a compute domain cache hierarchy. The I/O agent issues an ownership request to the compute domain caching agent to obtain ownership of a cache line in the compute domain cache hierarchy. In response to the ownership request, the compute domain caching agent places the cache line in the compute domain cache hierarchy in a placeholder state. The placeholder state reserves the cache line for performance of a write operation by the I/O agent. The compute domain caching agent writes data received from the I/O agent to the cache line in the compute domain cache hierarchy and transitions the state of the cache line out of the placeholder state.
Abstract:
Systems and methods are disclosed for virtualized caches. For example, an integrated circuit (e.g., a processor) for executing instructions includes a virtually indexed physically tagged first-level (L1) cache configured to output to an outer memory system one or more bits of a virtual index of a cache access as one or more bits of a requestor identifier. For example, the L1 cache may be configured to operate as multiple logical L1 caches with a cache way of a size less than or equal to a virtual memory page size. For example, the integrated circuit may include an L2 cache of the outer memory system that is configured to receive the requestor identifier and implement a cache coherency protocol to disambiguate an L1 synonym occurring in multiple portions of the virtually indexed physically tagged L1 cache associated with different requestor identifier values.
Abstract:
Apparatuses, systems, methods, and computer program products are disclosed for balanced caching. An input circuit (302) receives (702) a request for data of non-volatile storage. A balancing circuit (304) determines (704) whether to execute a request by directly communicating with one or more of a cache (410) and a non-volatile storage (408) based on a first rate corresponding to the cache (410) and a second rate corresponding to the non-volatile storage (408). A data access circuit (306) executes (706, 708) a request based on a determination made by a balancing circuit (304).
Abstract:
An improved protocol for data transfer between a request node and a home node of a data processing network that includes a number of devices coupled via an interconnect fabric is provided that minimizes the number of response messages transported through the interconnect fabric. When congestion is detected in the interconnect fabric, a home node sends a combined response to a write request from a request node. The response is delayed until a data buffer is available at the home node and home node has completed an associated coherence action. When the request node receives a combined response, the data to be written and the acknowledgment are coalesced in the data message.
Abstract:
Data processing apparatus comprises one or more interconnected processing elements; each processing element being configured to execute processing instructions of program tasks; each processing element being configured to save context data relating to a program task following execution of that program task by that processing element; and to load context data, previously saved by that processing element or another of the processing elements, at resumption of execution of a program task; each processing element having respective associated format definition data to define one or more sets of data items for inclusion in the context data; the apparatus comprising format selection circuitry to communicate the format definition data of each of the processing elements with others of the processing elements and to determine, in response to the format definition data for each of the processing elements, a common set of data items for inclusion in the context data.
Abstract:
Apparatuses, systems, and methods for coherently sharing data across a multi-node network is described. A coherency protocol for such data sharing can include identifying a memory access request from a requesting node for an I/O block of data in a shared I/O address space of a multi-node network, determining a logical ID and a logical offset of the I/O block, identifying an owner of the I/O block, negotiating permissions with the owner of the I/O block, and performing the memory access request on the I/O block.
Abstract:
Aspects disclosed herein include avoiding deadlocks in processor-based systems employing retry and in-order-response non-retry bus coherency protocols. In this regard, an interface bridge circuit is communicatively coupled to a first core device that implements a retry bus coherency protocol, and a second core device that implements an in-order-response non-retry bus coherency protocol. The interface bridge circuit receives a snoop command from the first core device, and forwards the snoop command to the second core device. While the snoop command is pending, the interface bridge circuit detects a potential deadlock condition between the first core device and the second core device. In response to detecting the potential deadlock condition, the interface bridge circuit is configured to send a retry response to the first core device. This enables the first core device to continue processing, thereby eliminating the potential deadlock condition.
Abstract:
A computer-implemented method for synchronizing local caches is disclosed. The method may include receiving a content update which is an update to a data entry stored in local caches of each of a plurality of remote servers. The method may include transmitting the content update to a first remote server to update a corresponding data entry in a local cache of the first remote server. Further, the method may include generating an invalidation command, indicating the change in the corresponding data entry. The method may include transmitting the invalidation command from the first remote server to the message server. The method may include generating, by the message server, a plurality of partitions based on the received invalidation command. The method may include transmitting, from the message server to each of the remote servers, the plurality of partitions, so that the remote servers update their respective local caches.
Abstract:
Techniques for caching may include: determining an update to a first data page of a first cache on a first node, wherein a second node includes a second cache and wherein the second cache includes a copy of the first data page; determining, in accordance with one or more criteria, whether to send the update from the first node to the second node; responsive to determining, in accordance with the one or more criteria, to send the update, sending the update from the first node to the second node; and responsive to determining not to send the update, sending an invalidate request from the first node to the second node, wherein the invalidate request instructs the second node to invalidate the copy of the first data page stored in the second cache of the second node.