摘要:
Improved method and apparatus for parallel processing. One embodiment provides a multiprocessor computer system that includes a first and second node controller, a number of processors being connected to each node controller, a memory connected to each controller, a first input/output system connected to the first node controller, and a communications network connected between the node controllers. The first node controller includes: a crossbar unit to which are connected a memory port, an input/output port, a network port, and a plurality of independent processor ports. A first and a second processor port connected between the crossbar unit and a first subset and a second subset, respectively, of the processors. In some embodiments of the system, the first node controller is fabricated onto a single integrated-circuit chip. Optionally, the memory is packaged on plugable memory/directory cards wherein each card includes a plurality of memory chips including a first subset dedicated to holding memory data and a second subset dedicated to holding directory data. Further, the memory port includes a memory data port including a memory data bus and a memory address bus coupled to the first subset of memory chips, and a directory data port including a directory data bus and a directory address bus coupled to the second subset of memory chips. In some such embodiments, the ratio of (memory data space) to (directory data space) on each card is set to a value that is based on a size of the multiprocessor computer system.
摘要:
A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters.
摘要:
A generalized approach to particle interaction can confer advantages over previously described method in terms of one or more of communications bandwidth and latency and memory access characteristics. These generalizations can involve one or more of at least spatial decomposition, import region rounding, and multiple zone communication scheduling. An architecture for computation of particle interactions makes use various forms of parallelism. In one implementation, the parallelism involves using multiple computation nodes arranged according to a geometric partitioning of a simulation volume.
摘要:
A generalized approach to particle interaction can confer advantages over previously described method in terms of one or more of communications bandwidth and latency and memory access characteristics. These generalizations can involve one or more of at least spatial decomposition, import region rounding, and multiple zone communication scheduling. An architecture for computation of particle interactions makes use various forms of parallelism. In one implementation, the parallelism involves using multiple computation nodes arranged according to a geometric partitioning of a simulation volume.
摘要:
An address translation unit generates a physical address for access to a memory from a virtual address using either a translation lookaside buffer or a segmentation buffer. If the virtual address falls within a predetermined range, the address translation unit will use the segmentation buffer to generate the physical address. Upon generation of the physical address, the memory will either receive data from or provide data to a processor in accordance with the instructions being processed by the processor.