摘要:
A generalized approach to particle interaction can confer advantages over previously described method in terms of one or more of communications bandwidth and latency and memory access characteristics. These generalizations can involve one or more of at least spatial decomposition, import region rounding, and multiple zone communication scheduling. An architecture for computation of particle interactions makes use various forms of parallelism. In one implementation, the parallelism involves using multiple computation nodes arranged according to a geometric partitioning of a simulation volume.
摘要:
A computation system for computing interactions in a multiple-body simulation includes an array of processing modules arranged into one or more serially interconnected processing groups of the processing modules. Each of the processing modules includes storage for data elements and includes circuitry for performing pairwise computations between data elements each associated with a spatial location. Each of the pairwise computations makes use of a data element from the storage of the processing module and a data element passing through the serially interconnected processing modules. Each of the processing modules includes circuitry for selecting the pairs of data elements according to separations between spatial locations associated with the data elements.
摘要:
A computation system for computing interactions in a multiple-body simulation includes an array of processing modules arranged into one or more serially interconnected processing groups of the processing modules. Each of the processing modules includes storage for data elements and includes circuitry for performing pairwise computations between data elements each associated with a spatial location. Each of the pairwise computations makes use of a data element from the storage of the processing module and a data element passing through the serially interconnected processing modules. Each of the processing modules includes circuitry for selecting the pairs of data elements according to separations between spatial locations associated with the data elements.
摘要:
An address translation unit generates a physical address for access to a memory from a virtual address using either a translation lookaside buffer or a segmentation buffer. If the virtual address falls within a predetermined range, the address translation unit will use the segmentation buffer to generate the physical address. Upon generation of the physical address, the memory will either receive data from or provide data to a processor in accordance with the instructions being processed by the processor.
摘要:
A node controller (12) in a computer system (10) includes a processor interface unit (24), a memory directory interface unit (22), and a local block unit (28). In response to a memory location in a memory (17) associated with the memory directory interface unit (22) being altered, the processor interface unit (24) generates an invalidation request for transfer to the memory directory interface unit (22). The memory directory interface unit (22) provides the invalidation request and identities of processors (16) affected by the invalidation request to the local block unit (28). The local block unit (28) determines which ones of the identified processors (16) are present in the computer system (10) and generates an invalidation message for each present processor (16) for transfer thereto. Each of the present processors (16) process their invalidation message and generate an acknowledgment message for transfer to the processor interface unit (24) that generated the invalidation request. The local block unit (28) determines which ones of the identified processors (16) are not present in the computer system (10) and generates an acknowledgment message for each non-existent processor (16). Each acknowledgment message is transferred to the processor interface unit (24) which generated the invalidation request.
摘要:
A node controller (12) in a computer system (10) includes a processor interface unit (24), a memory directory interface unit (22), and a local block unit (28). In response to a memory location in a memory (17) associated with the memory directory interface unit (22) being altered, the processor interface unit (24) generates an invalidation request for transfer to the memory directory interface unit (22). The memory directory interface unit (22) provides the invalidation request and identities of processors (16) affected by the invalidation request to the local block unit (28). The local block unit (28) determines which ones of the identified processors (16) are present in the computer system (10) and generates an invalidation message for each present processor (16) for transfer thereto. Each of the present processors (16) process their invalidation message and generate an acknowledgment message for transfer to the processor interface unit (24) that generated the invalidation request. The local block unit (28) determines which ones of the identified processors (16) are not present in the computer system (10) and generates an acknowledgment message for each non-existent processor (16). Each acknowledgment message is transferred to the processor interface unit (24) which generated the invalidation request.
摘要:
A processor may operate in one of a plurality of operating states. In a Normal operating state, the processor is not involved with a memory transaction. Upon receipt of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state. In the Transaction operating state, the processor performs changes to a cache line and data associated with the memory location. While in the Transaction operating state, any changes to the data and the cache line are not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction. After changes become visible, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded.
摘要:
A processor may operate in one of a plurality of operating states. In a Normal operating state, the processor is not involved with a memory transaction. Upon receipt of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state. In the Transaction operating state, the processor performs changes to a cache line and data associated with the memory location. While in the Transaction operating state, any changes to the data and the cache line is not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction. After changes become visible, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded.
摘要:
A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters.
摘要:
A distributed, shared memory computer architecture that is organized into a set of functionally independent processing nodes operating in a global, shared address space. Each node has one or more local processors, local memory and includes a common communication interface for communicating with other modules within the system via a message protocol. The common communication interface provides a single high-speed communications center within each node to operatively couple the node to one or more external processing nodes, an external routing module, an input/output (I/O) module. The common communication interface that facilitates the ability to incrementally add and swap the nodes of the system without disrupting the overall computing resources of the system.