摘要:
In one embodiment, a processor comprises a coherence trap unit and a trap logic coupled to the coherence trap unit. The coherence trap unit is also coupled to receive data accessed in response to the processor executing a memory operation. The coherence trap unit is configured to detect that the data matches a designated value indicating that a coherence trap is to be initiated to coherently perform the memory operation. The trap logic is configured to trap to a designated software routine responsive to the coherence trap unit detecting the designated value. In some embodiments, a cache tag in a cache may track whether or not the corresponding cache line has the designated value, and the cache tag may be used to trigger a trap in response to an access to the corresponding cache line.
摘要:
In one embodiment, a memory controller for a node in a multi-node computer system comprises logic and a control unit. The logic is configured to determine if an address corresponding to a request received by the memory controller on an intranode interconnect is a remote address or a local address. A first portion of the memory in the node is allocated to store copies of remote data and a remaining portion stores local data. The control unit is configured to write writeback data to a location in the first portion. The writeback data corresponds to a writeback request from the intranode interconnect that has an associated remote address detected by the logic. The control unit is configured to determine the location responsive to the associated remote address and one or more indicators that identify the first portion in the memory.
摘要:
A node in a multi-node system includes a memory, an active device that includes a cache, an interface that sends and receives coherency messages on an inter-node network coupling the node to another node, and an address network that communicates address packets between the devices in the node. In response to receiving a coherency message from the other node requesting an access right to a coherency unit, the interface sends an address packet on the address network. The address packet is a first type of address packet if the coherency unit is in the modified global access state in the node and a second type of address packet otherwise. If the active device is the owner of the coherency unit, the active device responds to the first type of address packet and ignores the second type of address packet.
摘要:
A node includes several devices including a memory, an active device, and an interface configured to send and receive coherency messages on an inter-node network coupling the node to another node, as well as an address network and a data network. In response to receiving a coherency message requesting an access right to a coherency unit, the interface is configured to send a first type of address packet on the address network if the global access state of the coherency unit within the node is the modified state and a second type of address packet otherwise. The memory is configured to respond to receipt of the second type of address packet by sending a data packet on the data network, regardless of whether the memory currently has an ownership responsibility for the coherency unit.
摘要:
A system may include multiple nodes, and each node may include a processing subsystem and an interface that are coupled by an address network and a data network. The nodes' interfaces may communicate over an inter-node network. Each processing subsystem may transition an access right to a coherency unit in response to a data packet on the data network and transition an ownership responsibility for the coherency unit in response to an address packet on the address network such that the access right transitions at a different time than the ownership responsibility transitions. An interface within a node may be configured to delay providing a data packet on the node's data network until the interface receives an indication that shared copies of the coherency unit in other nodes have been invalidated.
摘要:
A system may include a plurality of nodes. Each node may include one or more active devices coupled to one or more memory subsystems. An active device included in one of the nodes includes a memory management unit configured to receive a virtual address generated within that active device and to responsively output a global address identifying a coherency unit. A portion of the global address identifies a translation function. A memory subsystem included in the node is configured to perform the translation function identified by the portion of the global address on an additional portion of the global address in order to obtain a local physical address of the coherency unit. Each active device included in the node is configured to use the portion of the global address identifying the translation function when determining whether a local copy of the coherency unit is currently stored in a cache associated with that active device.
摘要:
A system may include several nodes coupled by an inter-node network configured to convey coherency messages between the nodes. Each node may include several active devices coupled by an address network and a data network. The nodes implement a coherency protocol such that if an active device in one of the nodes has an ownership responsibility for a coherency unit, no active device in any of the other nodes has a valid access right to the coherency unit. For example, if a node receives a coherency message requesting read access to a coherency unit from another node, the node may respond by conveying a proxy address packet, receipt of which removes ownership, on the node's address network to an owning active device. In contrast, the active device's ownership responsibility may not be removed in response to a device within the same node requesting read access to the coherency unit.
摘要:
A computer system includes a system memory and a plurality of active devices configured to access data associated with the system memory through an address network and a data network. Each of the active devices may be configured to cache data, and may include a promise array. Transitions in ownership of the given block may occur at a different time than the time at which the access right to the given block is changed. The promise array of an active device is provided to store information identifying an unreceived data packet to be conveyed to another device in response to a pending transaction to a cache block for which the active device is an owner. Each active device may be configured to have at most one outstanding transaction for each cache block.
摘要:
In one embodiment, a node for a multi-node computer system comprises a coherence directory configured to store coherence states for coherence units in a local memory of the node and a coherence controller configured to receive a coherence request for a requested coherence unit. The requested coherence unit is included in a memory region that includes at least two coherence units, and the coherence controller is configured to read coherence states corresponding to two or more coherence units from the coherence directory responsive to the coherence request. The two or more coherence units are included in a previously-accessed memory region, and the coherence controller is configured to provide the requested coherence unit with a predicted coherence state responsive to the coherence states in the previously accessed memory region.
摘要:
A method for controlling a software lock acquirable by processors in a plurality of nodes of a multiprocessing system is disclosed. The method comprises a first processor of a first node of the plurality of nodes acquiring the lock, and the first processor selectively releasing the lock in a first state that allows other processors within the first node to acquire the lock but that prevents processors in a remote node of the plurality of nodes from obtaining the lock. In another embodiment, a method comprises a first processor of a first node attempting to acquire the lock, the first processor determining whether another processor within the same node is remotely spinning on the lock, and the first processor remotely spinning on the lock in response to determining that another processor in the same node is not remotely spinning on the software lock.