摘要:
A symmetrical multiprocessing system includes a plurality of nodes interconnected by a hierarchical bus. To allow for the transfer of data between nodes and to restrict the global transfer of local transactions, a plurality of address partitions are defined: global space, local space, remote read space, and remote read and write space. Process private and local data is accessed using local space. Global data is accessed using global space. In one embodiment, a kernel of the operating system is resident in the local space of each node. Because the memory space where the kernel resides is designated as local space, no other nodes can write to, or corrupt, the node's kernel.
摘要:
A symmetrical multiprocessing system includes a plurality of nodes interconnected by a hierarchical bus. To allow for the transfer of data between nodes and to restrict the global transfer of local transactions, a plurality of address partitions are defined: global space, local space, remote read space, and remote read and write space. Process private and local data is accessed using local space. Global data is accessed using global space. In one embodiment, a kernel of the operating system is resident in the local space of each node. Because the memory space where the kernel resides is designated as local space, no other nodes can write to, or corrupt, the node's kernel.
摘要:
A flexible scheme is provided for designating the appropriate write-back protocol best suited for each memory level within a multi-level-cache computer system. The skip-level memory hierarchy of the present invention includes a lower-level copy-back cache and a higher-level write-through cache. This greatly simplifies the implementation of the higher-level cache, since it may be implemented with a write-or-read access to its address tag. Although counterintuitive, a write-through higher-level cache in a distributed shared memory may also increase the efficiency of the computer system without unduly increasing the volume of network traffic within the computer system. This is because a write-through higher-level cache increases the probability of readily-available cached copies of updated data which are consistent with the home copies of the data, thereby reducing the number of fetches from remote home locations whenever the data is not found in the lower-level cache but is found in the higher-level cache.
摘要:
Memory space in the lower-level cache (LLC) of a computer system is allocated in cache-line sized units, while memory space in the higher-level cache (HLC) of the computer system is allocated in page sized units; with each page including two or more cache lines. Accordingly, during the execution of a program, cache-line-sized components of a page-sized block of data are incrementally stored in the cache lines of the LLCs. Subsequently, the system determines that it is time to review the allocation of cache resources, i.e., between the LLC and the HLC. The review trigger may be external to the processor, e.g., a timer interrupting the processor on a periodic basis. Alternatively, the review trigger may be from the LLC or the HLC, e.g., when the LLC is full, or when usage of the HLC drops below a certain percentage. A review of the allocation involves identifying components associated with their respective blocks of data and determining if the number of cached components identified with the blocks exceed a threshold. If the threshold is exceeded for cached components associated with a particular block, space is allocated in the HLC for storing components from the block. This scheme advantageously increases the likelihood of future cache hits by optimally using the HLC to store blocks of memory with a substantial number of uses components.
摘要:
A computer system includes a directory at each node which stores coherency information for the coherency units for which that node is the home node. In addition, the directory stores a data access state corresponding to each coherency unit which indicates the data access pattern observed for that coherency unit. The data access state may indicate migratory or non-migratory data access patterns. If the coherency unit has been observed to have a migratory data access pattern, then read/write access rights are granted. Conversely, if the coherency unit has been observed to have non-migratory data access patterns, then read access rights are granted. The home node further detects the migratory and non-migratory data access patterns and selects transitions between the migratory and non-migratory data access states independent of the cache hierarchies within the nodes which access the affected coherency unit. In one embodiment, a pair of counters are employed for each coherency unit. One of the counters is incremented when the coherency unit is in the migratory data access state and a data migration is detected. The other counter is incremented when the coherency unit is in the non-migratory data access state and a data migration is detected. When one of the counters overflows, the home node transitions the data access state of the coherency unit to the alternate data access state.
摘要:
A symmetrical multiprocessing system includes a plurality of nodes interconnected by a hierarchical bus. To allow for the transfer of data between nodes and to restrict the global transfer of local transactions, a plurality of address partitions are defined: global space, local space, remote read space, and remote read and write space. Process private and local data is accessed using local space. Global data is accessed using global space. In one embodiment, a kernel of the operating system is resident in the local space of each node.
摘要:
A portion of the global memory of a multiprocessing computer system is allocated to each node, called local memory space. Data from a remote node may be copied to local memory space of a node such that accesses to the data may be performed locally rather than globally. The global address of the data is translated to a local physical address for the node to which the data is copied. To reduce the size of the translation tables for converting between global addresses and local physical addresses, multiple pages of the address space are mapped to an entry in a translation table. To decrease the probability that an entry is not available for a page, the translation table may be implemented as a skewed-associative cache that implements an insertion algorithm that realigns the translations in the table to maximize the utilization of the available entries is implemented.
摘要:
An efficient cache allocation scheme is provided for both uniprocessor and multiprocessor computer systems having at least one cache. In one embodiment, upon the detection of a cache miss, a determination of whether the cache miss is "avoidable" is made. In other words, would the present cache miss have occurred if the data had been cached previously and if the data had remained in the cache. One example of an avoidable cache miss in a multiprocessor system having a distributed memory architecture is an excess cache miss. An excess cache miss is either a capacity miss or a conflict miss. A capacity miss is caused by the insufficient size of the cache. A conflict miss is caused by insufficient depth in the associativity of the cache. The determination of the excess cache miss involves tracking read and write requests for data by the various processors and storing some record of the read/write request history in a table or linked list. Data is cached only after an avoidable cache miss has occurred. By caching only after at least one avoidable cache miss instead of upon every (initial) access, cache space can be allocated in a highly efficient manner thereby minimizing the number of data fetches caused by cache misses.
摘要:
A portion of the global memory of a multiprocessing computer system is allocated to each node, called local memory space. Data from a remote node may be copies to local memory space of a node such that accesses to the data may be performed locally rather than globally. The copies data is referred to as a shadow page. The global address of the data is translated to a local physical address for the node to which the data is copied. To reduce the size of the translation tables for converting between global addresses and local physical addresses, the page to which shadow copies may be stored and which global addresses may be converted to local physical addresses may be restricted. Multiple page of local memory space may be allocated to one entry of a local physical address to global address (LPA2GA) table. When a page is allocated to store shadow pages, an entry in the LPA2GA table associated with that page is marked as unavailable. Accordingly, new translations may not be stored to that entry of the LPA2GA table and other pages associated with that entry may not be allocated to store shadow pages. In a similar manner, multiple pages of the global address space are mapped to an entry in a global address to local physical address(GA2LPA) translation table. When data corresponding to a page within the global address space is stored as a shadow page, the entry associated with the global address is marked as unavailable. Accordingly, other pages associated with that entry of the GA2LPA table may not be stored as shadow pages because the entry is not available. The local copy of the data is not stored and the node must access the data globally. To decrease the probability that an entry is not available for a page, the GA2LPA table may be implemented as a set associative table. To further increase the availability of entries in the GA2LPA table, a skewed-associative cache that implements an insertion algorithm that realigns the translations in the table to maximize the utilization of the available entries is implemented.
摘要:
A shared memory system for symmetric multiprocessing systems including a plurality of physical memory locations in which the locations are either allocated to one node of a plurality of processing nodes, equally distributed among the processing nodes, or unequally distributed among the processing nodes. The memory locations are configured to be accessed by the plurality of processing nodes by mapping all memory locations into a plurality of address partitions within a hierarchy bus. The memory locations are addressed by a plurality of address aliases within the bus while the properties of the address partitions are employed to control transaction access generated in the processing nodes to memory locations allocated locally and globally within the processing nodes.