摘要:
Embodiments herein described a coherency protocol for a distributed computing topology that permits for large stalls on various interfaces. In one embodiment, the computing topology includes multiple boards which each contain multiple processors. When a particular core on a processor wants access to data that is not currently stored in its cache, the core can first initiate a request to search for the cache line in the caches for other cores on the same processor. If the cache line is not found, the cache coherency protocol permits the processor to then broadcast a request to the other processors on the same board. If a processor on the same board does not have the data, the processor can then broadcast the request to the other boards in the system. The processors in those boards can then search their caches to identify the data.
摘要:
A computer system and a method implementing a remote access array are provided. A first drawer may include a first processor chip. A first main memory region may be operatively connected to the first processor chip. A first non-addressable memory region may be operatively connected to the first processor chip and may include the first remote access array. The first remote access array may be configured to track data portions that are pulled from the first main memory region and that are sent to an external node. The first remote access array may be backed up in the first main memory region. The first remote access array may include one or more entries and may be configured to scrub all of the entries in response to a multi-drawer working partition being shrunk to fit within the first drawer.
摘要:
Processing simultaneous data requests regardless of active request in the same addressable index of a cache. In response to the cache miss in the given congruence, if the number of other compartments in the given congruence class that have an active operation is less than a predetermined threshold, setting a Do Not Cast Out (DNCO) pending indication for each of the compartments that have an active operation in order to block access to each of the other compartments that have active operations and, if the number of other compartments in the given congruence class that have an active operation is not less than a predetermined threshold, blocking another cache miss from occurring in the compartments of the given congruence class by setting a congruence class block pending indication for the given congruence class in order to block access to each of the other compartments of the given congruence class.
摘要:
Embodiments include methods, systems and computer program products method for maintaining ordered memory access with parallel access data streams associated with a distributed shared memory system. The computer-implemented method includes performing, by a first cache, a key check, the key check being associated with a first ordered data store. A first memory node signals that the first memory node is ready to begin pipelining of a second ordered data store into the first memory node to an input/output (I/O) controller. A second cache returns a key response to the first cache indicating that the pipelining of the second ordered data store can proceed. The first memory node sends a ready signal indicating that the first memory node is ready to continue pipelining of the second ordered data store into the first memory node to the I/O controller, wherein the ready signal is triggered by receipt of the key response.
摘要:
An aspect includes interlocking operations in an address-sliced cache system. A computer-implemented method includes determining whether a dynamic memory relocation operation is in process in the address-sliced cache system. Based on determining that the dynamic memory relocation operation is in process, a key operation is serialized to maintain a sequenced order of completion of the key operation across a plurality of slices and pipes in the address-sliced cache system. Based on determining that the dynamic memory relocation operation is not in process, a plurality of key operation requests is allowed to launch across two or more of the slices and pipes in parallel in the address-sliced cache system while ensuring that only one instance of the key operations is in process across all of the slices and pipes at a same time.
摘要:
In one embodiment, a computer-implemented method includes detecting a cache miss for a cache line. A resource is reserved on each of one or more remote computing nodes, responsive to the cache miss. A request for a state of the cache line on the one or more remote computing nodes is broadcast to the one or more remote computing nodes, responsive to the cache miss. A resource credit is received from a first remote computing node of the one or more remote computing nodes, responsive to the request. The resource credit indicates that the first remote computing node will not participate in completing the request. The resource on the first remote computing node is released, responsive to receiving the resource credit from the first remote computing node.
摘要:
Maintaining store order with high throughput in a distributed shared memory system. A request is received for a first ordered data store and a coherency check is initiated. A signal is sent that pipelining of a second ordered data store can be initiated. If a delay condition is encountered during the coherency check for the first ordered data store, rejection of the first ordered data store is signaled. If a delay condition is not encountered during the coherency check for the first ordered data store, a signal is sent indicating a readiness to continue pipelining of the second ordered data store.
摘要:
A computing device is provided and includes a plurality of nodes. Each node includes multiple chips and a node controller at which the multiple chips are assignable to logical partitions. Each of the multiple chips includes processors and a memory unit configured to handle local memory operations originating from the processors. The node controller includes a dynamic memory relocation (DMR) mechanism configured to move data having a DMR storage increment address relative to a local one of the memory units without interrupting a processing of the data by at least one of the logical partitions. During movement of the data by the DMR mechanism, the memory units are disabled from handling the local memory operations matching the DMR storage increment address and the node controller handles the local memory operations matching the DMR storage increment address.
摘要:
A cache includes a cache pipeline, a request receiver configured to receive off chip coherency requests from an off chip cache and a plurality of state machines coupled to the request receiver. The cache also includes an arbiter coupled between the plurality of state machines and the cache pipe line and is configured to give priority to off chip coherency requests as well as a counter configured to count the number of coherency requests sent from the cache pipeline to a lower level cache. The cache pipeline is halted from sending coherency requests when the counter exceeds a predetermined limit.
摘要:
Various embodiments of the present invention manage a hierarchical store-through memory cache structure. A store request queue is associated with a processing core in multiple processing cores. At least one blocking condition is determined to have occurred at the store request queue. Multiple non-store requests and a set of store requests associated with a remaining set of processing cores in the multiple processing cores are dynamically blocked from accessing a memory cache in response to the blocking condition having occurred.