摘要:
A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.
摘要:
An improved I/O processor (IOP) delivers high I/O performance while maintaining inter-reference ordering among memory reference operations issued by an I/O device as specified by a consistency model in a shared memory multiprocessor system. The IOP comprises a retire controller which imposes inter-reference ordering among the operations based on receipt of a commit signal for each operation, wherein the commit signal for a memory reference operation indicates the apparent completion of the operation rather than actual completion of the operation. In addition, the IOP comprises a prefetch controller coupled to an I/O cache for prefetching data into cache without any ordering constraints (or out-of-order). The ordered retirement functions of the IOP are separated from its prefetching operations, which enables the latter operations to be performed in an arbitrary manner so as to improve the overall performance of the system.
摘要:
A mechanism optimizes the generation of a commit-signal by control logic of the multiprocessor system in response to a memory reference operation issued by a processor to a local node of a multiprocessor system having a hierarchical switch for interconnecting a plurality of nodes. The mechanism generally comprises a structure that indicates whether the memory reference operation affects other processors of other nodes of the multiprocessor system. An ordering point of the local node generates an optimized commit-signal when the structure indicates that the memory reference operation does not affect the other processors.
摘要:
A technique reduces the latency of a memory barrier (MB) operation used to impose an inter-reference order between sets of memory reference operations issued by a processor to a multiprocessor system having a shared memory. The technique comprises issuing the MB operation immediately after issuing a first set of memory reference operations (i.e., the pre-MB operations) without waiting for responses to those pre-MB operations. Issuance of the MB operation to the system results in serialization of that operation and generation of a MB Acknowledgment (MB-Ack) command. The MB-Ack is loaded into a probe queue of the issuing processor and, according to the invention, functions to pull-in all previously ordered invalidate and probe commands in that queue. By ensuring that the probes and invalidates are ordered before the MB-Ack is received at the issuing processor, the inventive technique provides the appearance that all pre-MB references have completed.
摘要:
A technique reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory that is distributed among a plurality of processors that share a cache. According to the technique, each processor sharing a cache inherits a commit-signal that is generated by control logic of the multiprocessor system in response to a memory reference operation issued by another processor sharing that cache. The commit-signal facilitates serialization among the processors and shared memory entities of the multiprocessor system by indicating the apparent completion of the memory reference operation to those entities of the system.
摘要:
A mechanism reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory. The mechanism comprises a commit-signal that is generated by control logic of the multiprocessor system in response to an issued memory reference operation. The commit-signal facilitates inter-reference ordering; moreover, the commit signal indicates the apparent completion of the memory reference operation, rather than actual completion of the operation. The apparent completion of an operation occurs substantially sooner than the actual completion of an operation, thereby improving performance of the multiprocessor system.
摘要:
An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory. Memory performance is additionally improved by including, at each memory, a delayed write buffer which is used in conjunction with the directory to identify victims that are to be written to memory. An arb bus coupled to the output of the directory of each node provides a central ordering point for all messages that are transferred through the SMP. The messages comprise a number of transactions, and each transaction is assigned to a number of different virtual channels, depending upon the processing stage of the message. The use of virtual channels thus helps to maintain data coherency by providing a straightforward method for maintaining system order. Using the virtual channels and the directory structure, cache coherency problems that would previously result in deadlock may be avoided.
摘要:
Once a search query is received from a user, a standard index is searched based on the search query. The standard index forms part of a set of replicated standard indexes having multiple instances of the standard index. A signal is then determined based on the search of the standard index. When the received signal meets predefined criteria, an extended index is searched. The extended index forms part of a set of extended indexes having at least one instance of the extended index. There are fewer instances of the extended index than instances of the standard index. Extended search results are then obtained from the extended index and at least a portion of the extended search results is transmitted towards a user.
摘要:
A chip-multiprocessing system with scalable architecture, including on a single chip: a plurality of processor cores; a two-level cache hierarchy; an intra-chip switch; one or more memory controllers; a cache coherence protocol; one or more coherence protocol engines; and an interconnect subsystem. The two-level cache hierarchy includes first level and second level caches. In particular, the first level caches include a pair of instruction and data caches for, and private to, each processor core. The second level cache has a relaxed inclusion property, the second-level cache being logically shared by the plurality of processor cores. Each of the plurality of processor cores is capable of executing an instruction set of the ALPHA™ processing core. The scalable architecture of the chip-multiprocessing system is targeted at parallel commercial workloads. A showcase example of the chip-multiprocessing system, called the PIRAHNA™ system, is a highly integrated processing node with eight simpler ALPHA™ processor cores. A method for scalable chip-multiprocessing is also provided.
摘要:
The present invention relates generally to multiprocessor computer system, and particularly to a multiprocessor system designed to be highly scalable, using efficient cache coherence logic and methodologies. More specifically, the present invention is a system and method including a plurality of processor nodes configured to execute a cache coherence protocol that avoids the use of negative acknowledgment messages (NAKs) and ordering requirements on the underlying transaction-message interconnect/network and services most 3-hop transactions with only a single visit to the home node.