摘要:
A method and apparatus for providing cache coherence in a multiprocessor system which is configured into two or more nodes with memory local to each node and a tag and address crossbar system and a data crossbar system which interconnects all nodes. The disclosure is applicable to multiprocessor computer systems which utilize system memory distributed over more than one node and snooping of data states in each node which utilizes memory local to that node. Global snooping is used to provide a single point of serialization of data tags. A central crossbar controller examines cache state tags of a given address line for all nodes simultaneously and issues an appropriate reply back to a node requesting data while generating other data requests to any other node in the system for the purpose of maintaining cache coherence and supplying the requested data. The system utilizes memory local to each node by dividing such memory into local and remote categories which are mutually exclusive for any given cache line. The disclosure provides support for a third level remote cache for each node.
摘要:
A multinode multiprocessor computer system with distributed local clocks wherein a local clock may be synchronized with other clocks in the system without affecting the operation of the other clocks. A local clock to be synchronized is reset and counts an elapsed time since the reset. Simultaneously with resetting the local clock, a clock value from a clock on a source node is stored. The clock value from the source node is copied to the node to be synchronized and added to the elapsed time. The resulting summation is then stored in the local clock to be synchronized. As a result, the local clock is synchronized to the clock on the source node. In one system embodiment, the local clock includes a dynamic register and a base register and an adder adds the two portions together to generate an output of the local clock. For a node being synchronized, the dynamic portion is reset and allowed to count the elapsed time while the base portion is loaded with a clock value copied from the source node. In another system embodiment, a clock register stores both dynamic and base portions. For a node being synchronized, the clock register is reset and allowed to count the elapsed time. The base portion from the source node is then added to the clock register and stored in the clock register.
摘要:
A multiple-stage pipeline for transaction conversion is disclosed. A method is disclosed that converts a transaction into a set of concurrently performable actions. In a first pipeline stage, the transaction is decoded into an internal protocol evaluation (PE) command, such as by utilizing a look-up table (LUT). In a second pipeline stage, an entry within a PE random access memory (RAM) is selected, based on the internal PE command. This may be accomplished by converting the internal PE command into a PE RAM base address and an associated qualifier thereof. In a third pipeline stage, the entry within the PE RAM is converted to the set of concurrently performable actions, such as based on the PE RAM base address and its associate qualifier.
摘要:
Removing building blocks from partitions to which they have been bound is disclosed. A building block of a platform is removed from a partition of the platform by first halting activity by the partition on the building block. A first partition identifier of the building block indicates the partition of the building block. The building block joined the partition in a masterless manner. The resources of the building block are withdrawn from the partition, and the building block is deauthorized from the platform.
摘要:
A method and apparatus for maintaining processor consistency in a multiprocessor computer such as a multinode computer system are disclosed. A processor proceeds with write operations before its previous write operations complete, while processor consistency is maintained. A write operation begins with a request by the processor to invalidate copies of the data stored in other nodes. This current invalidate request is queued while acknowledging to the processor that the request is complete even though it has not actually completed. The processor proceeds to complete the write operation by changing the data. It can then execute subsequent operations, including other write operations. The queued request, however, is not transmitted to other nodes in the computer until all previous invalidate requests by the processor are complete. This ensures that the current invalidate request will not pass a previous invalidate request. The invalidate requests are added and removed from a processor's outstanding invalidate list as they arise and are completed. An invalidate request is completed by notifying the nodes in a linked list related to the current invalidate request that data shared by the node is now invalid.
摘要:
A masterless approach binds multiprocessor building blocks to partitions of a computer system using identifiers and indicators. A number of building blocks communicate among each other to determine a partition to which each building block is to be partitioned. For each unique partition to which one or more of the building blocks is to be partitioned, the building blocks communicate among each other to determine building block uniqueness, and then each of the building blocks joins the partition. The building blocks share with one another their logical port identifiers, which uniquely identify the building block within a partition. A commit indicator of each building block indicates that the building block has committed itself to the partition and that its identifiers cannot be changed. A partition protect indicator is set by one building block of a partition, preventing changes to the commit indicators of other building blocks wishing to join the partition except by that one building block, effectively protecting the partition. Building block protect indicators protect the building blocks themselves.
摘要:
A masterless approach for binding building blocks to partitions is disclosed. Other blocks are first sent a first physical port identifier indicating a block's physical location, and a first partition identifier indicating the block's partition. Second physical port identifiers and second partition identifiers are received from the other blocks. The first physical port identifier and the second physical port identifiers of a subset of the other blocks are then sent to the subset, the second partition identifiers of the subset being equal to the first partition identifier. The first physical port identifier and the second physical port identifiers of the subset are also received from each block of the subset. A first logical port identifier indicating the block's logical location is sent to the subset, and second logical port identifiers are received from the subset. The block joins the partition indicated by the first partition identifier.
摘要:
A system and method of partitioning a multiprocessor or multinode computer system containing two or more partitions each of which contain at least three nodes or processors and a central hardware device communicating with a requestor node or processor, a target node or processor and at least one additional node or processor in the partition. The multiprocessor system architecture allows for partitioning resources to define separate subsystems capable of running different operating systems simultaneously. The method operates with the central device, a tag and address crossbar system, which transmits requests for data from the requestor node to the target node, but not to any of the additional nodes or processors which are not defined as part of a given partition. The method provides steps of assignment of definitions to physical ports with the central device corresponding with desired partitioning of resources within the system. Data processed within the system is assigned tags which themselves are related to the defined system resources allocated to one or more desired partitions within the system.
摘要:
An apparatus and method is disclosed for allowing a multiprocessor computer system with shared memory distributed among multiple nodes to appear like a single-node environment. The single-node environment is implemented with a memory map that has a unique address for every memory location in the system. Overlapping address spaces in the multinode environment are also assigned unique representative addresses that are translated to actual addresses in conformance with the multinode environment. The apparatus and method allows a wide variety of operating systems to be run on the multinode environment. Additionally, industry standard BIOS and chip sets can be used.
摘要:
A method of invalidating shared cache lines such as on a sharing list by issuing an invalidate acknowledgement before actually invalidating a cache line. The method is useful in multiprocessor systems such as a distributed shared memory (DSM) or non-uniform memory access (NUMA) machines that include a number of interconnected processor nodes each having local memory and caches that store copies of the same data. In such a multiprocessor system using the Scalable Content Interface (SCI) protocol, an invalidate request is sent from the head node on the sharing list to a succeeding node on the list. In response to the invalidate request, the succeeding node issues an invalidate acknowledgement before the cache line is actually invalidated. After issuing the invalidate acknowledgement, the succeeding node initiates invalidation of the cache line. The invalidate acknowledgement can take the form of a response to the head node or a forwarding of the invalidate request to the next succeeding node on the list. To maintain processor consistency, a flag is set each time an invalidate acknowledgement is sent. The flag is cleared after the invalidation of the cache line is completed. Cacheable transactions received at the succeeding node while a flag is set are delayed until the flag is cleared.