Abstract:
A system and method for verifying that a processor design having caches conforms to a specific memory model. The caches might not be maintained coherent in real time. Specifically, the system and method make use of a checker that conforms to the memory model, a time-stamping scheme, and a store buffering scheme to identify a bug(s) in the processor design that violates the memory model and/or loads an incorrect value in response to a load instruction.
Abstract:
A system and method for supporting targeted stores in a shared-memory multiprocessor. A targeted store enables a first processor to push a cache line to be stored in a cache memory of a second processor. This eliminates the need for multiple cache-coherence operations to transfer the cache line from the first processor to the second processor. More specifically, the disclosed embodiments provide a system that notifies a waiting thread when a targeted store is directed to monitored memory locations. During operation, the system receives a targeted store which is directed to a specific cache in a shared-memory multiprocessor system. In response, the system examines a destination address for the targeted store to determine whether the targeted store is directed to a monitored memory location which is being monitored for a thread associated with the specific cache. If so, the system informs the thread about the targeted store.
Abstract:
A method and system for allocating data streams that includes receiving, at an allocator, a data stream. The data stream includes a memory address and data associated with the memory address. The method also includes examining, by the allocator, the data stream to make a determination that the data stream is a soft allocating data stream, and then sending, from the allocator based on the determination, a plurality of write probes to a plurality of caches, wherein each write probe of the plurality of write probes includes at least part of the memory address. Additionally, the method includes receiving, at the allocator in response to a write probe of the plurality of write probes, a cache line present acknowledgement from a cache of the plurality of caches, and directing, by the allocator in response to the cache line present acknowledgement, the data of the data stream to the cache.
Abstract:
A memory system is described that provides error detection and correction after a failure of a memory component. Each block of data in the memory system includes an array of bits logically organized into R rows and C columns, including C-2 data-bit columns containing data bits, a row check bit column including row-parity bits for each of the R rows in the block, and an inner check bit column including X inner check bits. The inner check bits are defined to cover bits in the array according to a set of check vectors, wherein each check vector is associated with a different bit in the array and is an element of Res(P), a residue system. Moreover, each column is stored in a different memory component, and the check bits are generated from the data bits to provide block-level detection and correction for both memory errors and a failed memory component.
Abstract:
A method for cache coherence, including: broadcasting, by a requester cache (RC) over a partially-ordered request network (RN), a peer-to-peer (P2P) request for a cacheline to a plurality of slave caches; receiving, by the RC and over the RN while the P2P request is pending, a forwarded request for the cacheline from a gateway; receiving, by the RC and after receiving the forwarded request, a plurality of responses to the P2P request from the plurality of slave caches; setting an intra-processor state of the cacheline in the RC, wherein the intra-processor state also specifies an inter-processor state of the cacheline; and issuing, by the RC, a response to the forwarded request after setting the intra-processor state and after the P2P request is complete; and modifying, by the RC, the intra-processor state in response to issuing the response to the forwarded request.
Abstract:
A method and system for allocating data streams that includes receiving, at an allocator, a data stream. The data stream includes a memory address and data associated with the memory address. The method also includes examining, by the allocator, the data stream to make a determination that the data stream is a soft allocating data stream, and then sending, from the allocator based on the determination, a plurality of write probes to a plurality of caches, wherein each write probe of the plurality of write probes includes at least part of the memory address. Additionally, the method includes receiving, at the allocator in response to a write probe of the plurality of write probes, a cache line present acknowledgement from a cache of the plurality of caches, and directing, by the allocator in response to the cache line present acknowledgement, the data of the data stream to the cache.
Abstract:
A memory system is described that provides error detection and correction after a failure of a memory component. Each block of data in the memory system includes an array of bits logically organized into R rows and C columns, including C-2 data-bit columns containing data bits, a row check bit column including row-parity bits for each of the R rows in the block, and an inner check bit column including X inner check bits. The inner check bits are defined to cover bits in the array according to a set of check vectors, wherein each check vector is associated with a different bit in the array and is an element of Res(P), a residue system. Moreover, each column is stored in a different memory component, and the check bits are generated from the data bits to provide block-level detection and correction for both memory errors and a failed memory component.
Abstract:
A system includes a number of processors with each processor including a cache memory. The system also includes a number of directory controllers coupled to the processors. Each directory controller may be configured to administer a corresponding cache coherency directory. Each cache coherency directory may be configured to track a corresponding set of memory addresses. Each processor may be configured with information indicating the corresponding set of memory addresses tracked by each cache coherency directory. Directory redundancy operations in such a system may include identifying a failure of one of the cache coherency directories; reassigning the memory address set previously tracked by the failed cache coherency directory among the non-failed cache coherency directories; and reconfiguring each processor with information describing the reassignment of the memory address set among the non-failed cache coherency directories.
Abstract:
The disclosed embodiments provide a memory system that provides error detection and correction. Each block of data in the memory system includes an array of bits logically organized into R rows and C columns, including C−M−1 data-bit columns containing data bits, a row check bit column including row-parity bits for each of the R rows in the block, and M inner check bit columns that collectively include MR inner check bits. These inner check bits are defined to cover bits in the array in accordance with a set of check vectors, wherein each check vector is associated with a different bit in the array and is an element of Res(P), a residue system comprising a set of polynomials with GF(2) coefficients modulo a polynomial P with GF(2) coefficients, wherein each column is associated with a different pin in a memory module interface, and wherein the check bits are generated from the data bits to facilitate block-level detection and correction for errors that arise during the transmission. During operation, the system transmits a block of data from the memory. Next, the system uses an error-detection circuit to examine the block of data, and determine whether an error has occurred during the transmission based on the examination.
Abstract:
A method and apparatus are disclosed for enabling nodes in a distributed system to share one or more memory portions. A home node makes a portion of its main memory available for sharing, and one or more sharer nodes mirrors that shared portion of the home node's main memory in its own main memory. To maintain memory coherency, a memory coherence protocol is implemented. Under this protocol, load and store instructions that target the mirrored memory portion of a sharer node are trapped, and store instructions that target the shared memory portion of a home node are trapped. With this protocol, valid data is obtained from the home node and updates are propagated to the home node. Thus, no “dirty” data is transferred between sharer nodes. As a result, the failure of one node will not cause the failure of another node or the failure of the entire system.