摘要:
A method is provided for scheduling threads in a multi-processor system. In a first structure thread ids are stored for threads associated with a context switch. Each thread id identifies one thread. In a second structure entries are stored for groups of contiguous cache lines. Each entry is arranged such that a thread id in the first structure is capable of being associated with at least one contiguous cache line in at least one group, the thread identified by the thread id having accessed the at least one contiguous cache line. Patterns are mined for in the entries to locate multiples of a same thread id that repeat for at least two groups. Threads identified by the located multiples of the same thread id are mapped to at least one native thread, and are scheduled on the same processor with other threads associated with the at least two groups.
摘要:
According to one aspect of the invention, there is provided a method for dynamically changing page types in a unified scalable shared-memory architecture. The method includes the step of assigning a default page type of a given page as simple cache only memory architecture (SCOMA). Upon n memory references, a first parameter of the given page is calculated. A second parameter of the given page is calculated, when the first parameter is greater than a first threshold. The page type of the given page is dynamically changed to cache-coherent non-uniform memory architecture (ccNUMA), when the second parameter is greater than a second threshold. The first and the second parameters are one of a page reference probability and one minus a page utilization, the second parameter being different than the first parameter. According to another aspect of the invention, the n memory references correspond to all pages. According to yet another aspect of the invention, the n memory references correspond only to the given page.
摘要:
A method of reducing false sharing in a shared memory system by enabling two caches to modify the same line at the same time. More specifically, with this invention a lock associated with a segment of shared memory is acquired, where the segment will then be used exclusively by processor of the shared memory system that has acquired the lock. For each line of the segment, an invalidation request is sent to a number of caches of the system. When a cache receives the invalidation request, it invalidates each line of the segment that is in the cache. When each line of the segment is invalidated, an invalidation acknowledgement is sent to the global directory. For each line of the segment that has been updated or modified, the update data is written back to main memory. Then, an acquire signal is sent to the requesting processor which then has exclusive use of the segment.
摘要:
A mechanism to dynamically migrate a home node of a global page to a more suitable node for improving performance of parallel applications running on a S-COMA and other DSM systems. More specifically, consultation counts are maintained at each client node of a shared memory system, where the consultation count indicates the number of times the client node has consulted the dynamic for lines a page. This information is then used along with other information to decide on whether to change the dynamic home node to a more suitable node.
摘要:
A cache coherence protocol for a multiprocessor system. Each processor in the system has an associated cache capable of storing multiple word data lines. The system also includes a plurality of main memory modules, each having an associated distributed global directory storing directory information for lines stored in the associated main memory module. Each main memory module is connected to each processor by means of a multi-stage interconnection network. When a processor attempts to over-write an individual word in a line stored in its associated cache, a write request signal is sent to the appropriate global directory, and each other processor whose cache stores a copy of the line is notified of the request. When each other processor has responded with an acknowledgement, the first processor is allowed to proceed with the write.
摘要:
A method for assuring virtual atomic invalidation in a multilevel cache system wherein lower level cache locations store portions of a line stored at a higher level cache location. Upon receipt of an invalidation signal, the higher level cache location invalidates the line and places a HOLD bit on the invalidated line. Thereafter, the higher level cache sends invalidation signals to all lower level caches which store portions of the invalidated line. Each lower level cache invalidates its portion of the line and sets a HOLD bit on its portion of the line. The HOLD bits are reset after all line portion invalidations have been completed.
摘要:
A method of maintaining cache coherency in a shared memory multiprocessor system having a plurality of nodes, where each node itself is a shared memory multiprocessor. With this invention, an additional shared owner state is maintained so that if a cache at the highest level of cache memory in the system issues a read or write request to a cache line that misses the highest cache level of the system, then the owner of the cache line places the cache line on the bus interconnecting the highest level of cache memories.
摘要:
A method for storing redundant information in an array of data storage devices such that data is protected against two simultaneous storage device failures. The method assigns each data block to two different parity sets, each protected by a different parity block. The protected data blocks and the parity block each reside on a different data storage device.
摘要:
An optimization scheme for a directory-based cache coherence protocol for multistage interconnection network-based multiprocessors improves system performance by reducing network latency. The optimization scheme is scalable, targeting multiprocessor systems having a moderate number of processors. The modification of shared data is the dominant contributor to performance degradation in these systems. The directory-based cache coherence scheme uses an invalidation bus on the processor side of the network. The invalidation bus connects all the private caches in the system and processes the invalidation requests thereby eliminating the need to send invalidations across the network. In operation, a processor which attempts to modify data places an address of the data to be modified on the invalidation bus simultaneously with sending a store request for the data modification to the global directory and the global directory sends to the processor attempting to modify the data, in addition to the permission signal, a count of the number of invalidation acknowledgments the processor should receive.
摘要:
A protocol for achieving atomic multicast in a parallel or distributed computing environment. The protocol guarantees concurrency atomicity with a maximum of m-1 message passes among the m server nodes of the system. Under one embodiment of the protocol, an access component message is transferred to the server nodes storing data to be accessed. The first server node of the plurality generates a token to be passed among the accessed nodes. A node can not process its request until it receives the token. A node may pass the token immediately upon ensuring that it is the current expected token.