摘要:
A method for avoiding data loss due to cancelled transactions within a non-uniform memory access (NUMA) data processing system is disclosed. A NUMA data processing system includes a node interconnect to which at least a first node and a second node are coupled. The first and the second nodes each includes a local interconnect, a system memory coupled to the local interconnect, and a node controller interposed between the local interconnect and a node interconnect. The node controller detects certain situations which, due to the nature of a NUMA data processing system, can lead to data loss. These situations share the common feature that a node controller ends up with the only copy of a modified cache line and the original transaction that requested the modified cache line may not be issued again with the same tag or may not be issued again at all. The node controller corrects these situations by issuing its own write transaction to the system memory for that modified cache line using its own tag, and then providing the data the modified cache line is holding. This ensures that the modified data will be written to the system memory.
摘要:
A non-uniform memory access (NUMA) data processing system includes a node interconnect to which at least a first processing node and a second processing node are coupled. The first and the second processing nodes each include a local interconnect, a processor coupled to the local interconnect, a system memory coupled to the local interconnect, and a node controller interposed between the local interconnect and the node interconnect. In order to reduce communication latency, the node controller of the first processing node speculatively transmits request transactions received from the local interconnect of the first processing node to the second processing node via the node interconnect. In one embodiment, the node controller of the first processing node subsequently transmits a status signal to the node controller of the second processing node in order to indicate how the request transaction should be processed at the second processing node.
摘要:
A non-uniform memory access (NUMA) computer system includes an interconnect to which multiple processing nodes (including first, second, and third processing nodes) are coupled. Each of the first, second, and third processing nodes includes at least one processor and a local system memory. The NUMA computer system further includes a transaction buffer, coupled to the interconnect, that stores communication transactions transmitted on the interconnect that are both initiated by and targeted at a processing node other than the third processing node. In response to a determination that a particular communication transaction originally targeting another processing node should be processed by the third processing node, buffer control logic coupled to the transaction buffer causes the particular communication transaction to be retrieved from the transaction buffer and processed by the third processing node. In one embodiment, the interconnect includes a broadcast fabric, and the transaction buffer and buffer control logic form a portion of the third processing node.
摘要:
A queue includes a data multiplexer having an output and at least two inputs and a plurality of data latches. The data latches include at least a first data latch and a second data latch, which each have a data input and a data output. The data output of the first data latch is coupled to a first input of the data multiplexer, and the output of the data multiplexer is coupled to the data input of the second data latch. A data value to be stored in the queue is received at a second input to the data multiplexer. In response to one or more control signals, the data value is latched into at least one of the first and second data latches, thereby storing the data value in the queue. Depending upon the design of the control logic, the queue can implement either first in, first out (FIFO) or last in, first out (LIFO) behavior.
摘要:
A non-uniform memory access (NUMA) computer system includes a node interconnect and a plurality of processing nodes that each contain at least one processor, a local interconnect, a local system memory, and a node controller coupled to both a respective local interconnect and the node interconnect. According to the method of the present invention, a communication transaction is transmitted on the node interconnect from a local processing node to a remote processing node. In response to receipt of the communication transaction by the remote processing node, a response including a coherency response field is transmitted on the node interconnect from the remote processing node to the local processing node. In response to receipt of the response at the local processing node, a request is issued on the local interconnect of the local processing node concurrently with a determination of a coherency response indicated by the coherency response field.
摘要:
A method for avoiding livelocks due to colliding invalidating transactions within a non-uniform memory access system is disclosed. A NUMA computer system includes at least two nodes coupled to an interconnect. Each of the two nodes includes a local system memory. In response to a request by a processor of a first node to invalidate a remote copy of a cache line also stored within its cache memory at substantially the same time when a processor of a second node is also requesting to invalidate said cache line, one of the two requests is allowed to complete. The allowed request is the first request to complete without retry at the point of coherency, typically the home node. Subsequently, the other one of the two requests is permitted to complete.
摘要:
A non-uniform memory access (NUMA) computer system includes a plurality of processing nodes coupled to a node interconnect. The plurality of processing nodes include at least a remote processing node, which contains a processor having an associated cache hierarchy, and a home processing node. The home processing node includes a shared system memory containing a plurality of memory granules and a coherence directory that indicates possible coherence states of copies of memory granules among the plurality of memory granules that are stored within at least one processing node other than the home processing node. If the processor within the remote processing node has a reservation for a memory granule among the plurality of memory granules that is not resident within the associated cache hierarchy, the coherence directory associates the memory granule with a coherence state indicating that the reserved memory granule may possibly be held non-exclusively at the remote processing node. In this manner, the coherence mechanism can be utilized to manage processor reservations even in cases in which a reserving processor's cache hierarchy does not hold a copy of the reserved memory granule.
摘要:
A method and system for providing an eviction protocol within a non-uniform memory access (NUMA) computer system are disclosed. A NUMA computer system includes at least two nodes coupled to an interconnect. Each of the two nodes includes a local system memory. In response to a request for evicting an entry from a sparse directory, an non-intervention writeback request is sent to a node having the modified cache line when the entry is associated with a modified cache line. After the data from the modified cache line has been written back to a local system memory of the node, the entry can then be evicted from the sparse directory. If the entry is associated with a shared line, an invalidation request is sent to all nodes that the directory entry indicates may hold a copy of the line. Once all invalidations have been acknowledged, the entry can be evicted from the sparse directory.
摘要:
A method for avoiding livelocks due to stale exclusive/modified directory entries within a non-uniform memory access (NUMA) computer system is disclosed. A NUMA computer system includes at least two nodes coupled to an interconnect. Each of the two nodes includes a local system memory. In response to an attempt by a processor of a first node to read a cache line at substantially the same time as a processor of a second node attempts to access the same cache line, wherein the cache line has been silently cast out from a cache memory within the second node even though a coherency directory within the node still indicates the cache line is held exclusively in the second node, the processor of the second node is allowed to access the cache line only if the second node is an owning node of the cache line. The processor of the first node is then allowed to access the cache line.
摘要:
A non-uniform memory access (NUMA) computer system includes first and second processing nodes that are each coupled to a node interconnect. The first processing node includes a system memory and first and second processors that each have a respective one of first and second cache hierarchies, which are coupled for communication by a local interconnect. The second processing node includes at least a system memory and a third processor having a third cache hierarchy. The first cache hierarchy and the third cache hierarchy are permitted to concurrently store an unmodified copy of a particular cache line in a Recent coherency state from which the copy of the particular cache line can be sourced by shared intervention. In response to a request for the particular cache line by the second cache hierarchy, the first cache hierarchy sources a copy of the particular cache line to the second cache hierarchy by shared intervention utilizing communication on only the local interconnect and without communication on the node interconnect.