摘要:
Multiprocessor systems conducting operations utilizing global shared memory must ensure that the memory is coherent. A hybrid system that combines hardware memory transactions with that of direct messaging provides memory coherence with minimal overhead requirement or bandwidth demands. Memory access transactions are intercepted and converted to direct messages which are then communicated to a target and/or remote node. Thereafter the message invokes a software handler which implements the cache coherence protocol. The handler uses additional messages to invalidate or fetch data in other caches, as well as to return data to the requesting processor. These additional messages are converted to appropriate hardware transactions by the destination system interface hardware.
摘要:
A method and system for detecting race conditions computing systems. A parallel computing system includes multiple processor cores is coupled to memory. An application with a code sequence in which parallelism to be exploited is executed on this system. Different processor cores may operate on a given memory line concurrently. Extra bits are associated with the memory data line and are used to indicate changes to corresponding subsections of data in the memory line. A memory controller may perform a comparison between check bits of a memory line to determine if more than one processor core modified the same section of data in a cache line and a race condition has occurred.
摘要:
The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.
摘要:
The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.
摘要:
The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.
摘要:
The invention relates to a method for reducing cache flush time of a cache in a computer system. The method includes populating at least one of a plurality of directory entries of a dirty line directory based on modification of the cache to form at least one populated directory entry, and de-populating a pre-determined number of the plurality of directory entries according to a dirty line limiter protocol causing a write-back from the cache to a main memory, where the dirty line limiter protocol is based on a number of the at least one populated directory entry exceeding a pre-defined limit.
摘要:
The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.
摘要:
The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.
摘要:
In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides. These integrated circuits receive optical signals from a set of tunable light sources. Moreover, a given integrated circuit includes: a transmitter that modulates at least one of the optical signals when transmitting information to at least another of the integrated circuits; and a receiver that receives at least one modulated optical signal having a given carrier wavelength associated with the given integrated circuit when receiving information from at least the other of the integrated circuits. Furthermore, control logic in the MCM provides a control signal to the set of tunable light sources to specify carrier wavelengths in the optical signals output by the set of tunable light sources, thereby defining routing of at least the one of the optical signals in the MCM during communication between at least a pair of the integrated circuits.
摘要:
In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides. These integrated circuits receive optical signals from a set of light sources which have fixed carrier wavelengths. Moreover, a given integrated circuit includes: a transmitter that modulates at least one of the optical signals when transmitting information to at least another of the integrated circuits; and a receiver that receives at least one modulated optical signal having one of the carrier wavelengths when receiving information from at least the other of the integrated circuits. Furthermore, the MCM includes switchable drop filters optically coupled to the optical waveguides and associated integrated circuits, wherein the switchable drop filters pass adjustable bands of wavelengths to receivers in the integrated circuits. Additionally, control logic in the MCM provides a control signal to the switchable drop filters to specify the adjustable bands of wavelengths.