摘要:
A digital computer comprises a plurality of processing elements, a communications router, and a control network. Each processing element performs data processing operations in connection with commands, at least some of the processing elements performing the data processing operations in connection with the commands in messages they receive over the control network. Each processing element also generates and receives data transfer messages, each including an address portion containing an address, for transfer to another processing element as identified by the address. At least one of the processing elements further generates the control network messages for transfer over the communications router. The communications router comprises router nodes interconnected in the form of a "fat-tree," and the control network comprises control network nodes interconnected in the form of a tree, with the processing elements being connected at the leaf nodes of the respective communications router and control network.
摘要:
A massively-parallel computer includes a plurality of processing nodes and at least one control node interconnected by a network. The network faciliates the transfer of data among the processing nodes and of commands from the control node to the processing nodes. Each processing node includes an interface for transmitting data over, and receiving data and commands from, the network, at least one memory module for storing data, a node processor and an auxiliary processor. The node processor receives commands received by the interface and processes data in response thereto, in the process generating memory access requests for facilitating the retrieval of data from or storage of data in the memory module. The node processor further controlling the transfer of data over the network by the interface. The auxiliary processor is connected to the memory module and the node processor. In response to memory access requests from the node processor, the auxiliary processor performs a memory access operation to store data received from the node processor in the memory module, or to retrieve data from the memory module for transfer to the node processor. In response to auxiliary processing instructions from the node processor, the auxiliary processor performs data processing operations in connection with data in the memory module.
摘要:
A massively-parallel computer includes a plurality of processing nodes and at least one control node interconnected by a network. The network faciliates the transfer of data among the processing nodes and of commands from the control node to the processing nodes. Each each processing node includes an interface for transmitting data over, and receiving data and commands from, the network, at least one memory module for storing data, a node processor and an auxiliary processor. The node processor receives commands received by the interface and processes data in response thereto, in the process generating memory access requests for facilitating the retrieval of data from or storage of data in the memory module. The node processor further controlling the transfer of data over the network by the interface. The auxiliary processor is connected to the memory module and the node processor. In response to memory access requests from the node processor, the auxiliary processor performs a memory access operation to store data received from the node processor in the memory module, or to retrieve data from the memory module for transfer to the node processor. In response to auxiliary processing instructions from the node processor, the auxiliary processor performs data processing operations in connection with data in the memory module.
摘要:
A digital computer comprising a plurality of processors interconnected by a network for transferring messages among the processors. At least one processor generates messages of a configuration type. The network comprises a plurality of nodes interconnected in a tree pattern in a series of levels from a lower leaf level to an upper physical root level, with the leaf nodes connected to the processors. Each of the nodes includes a root flag that can be set or cleared in response to a message of the configuration type to establish the node as a logical root. For each node, if the node is a logical root it transfers messages received from a node at a lower level in the tree back down the tree, but if the node is not a logical root it transfers messages received at a lower level node to a higher level node.
摘要:
A performance counter to monitor a plurality of events that may occur in a component within a computer system during a monitoring period or testing period. The monitoring results, which are provided upon completion of the performance testing, may be used to provide histogram representations of the component performance. In one embodiment, the performance counter comprises a first storage, a second storage, programmable control logic, and a counting mechanism. The first storage is configured to store information indicative of a plurality of events to be monitored and the monitoring period for each event. The second storage is configured to store counting results obtained during the testing period. A counting mechanism, which is coupled to the second storage, is configured to monitor the occurrence of the events in the component under test. The counting mechanism is coupled to the control logic and the control logic is coupled to the first storage.
摘要:
Protocol agents involved in the performance of global coherency activity detect errors with respect to the activity being performed. The errors are logged by a computer system such that diagnostic software may be executed to determine the error detected and to trace the error to the erring software or hardware. In particular, information regarding the first error to be detected is logged. Subsequent errors may receive more or less logging depending upon programmable configuration values. Additionally, those errors which receive full logging may be programmably selected via error masks. The protocol agents each comprise multiple independent state machines which independently process requests. If the request which a particular state machine is processing results in an error, the particular state machine may enter a freeze state. Information regarding the request which is collected by the state machine may thereby be saved for later access. A state machine freezes upon detection of the error if a maximum number of the multiple state machines are not already frozen and the aforementioned error mask indicates that full error logging is employed for the detected error. Therefore, at least a minimum number of the multiple state machines remain functioning even in the presence of a large number of errors. Still further, prior to entering the freeze state, the protocol state machines may transition through a recovery state in which resources not used for error logging purposes are freed from the erring request.
摘要:
An interrupt mechanism handles an interrupt transaction between a source processor and a target processor on separate nodes in a multi-processor system. The nodes are connected to a network through node interface controls between the node and the network. The transaction begins by initiating the interrupt transaction at the source processor. The interrupt mechanism detects if the target processor is at a remote node on a system bus across the network, and if it is the mechanism sends an ignore signal to the source processor. Then the mechanism suspends the interrupt transaction at the source processor if it detects the target processor is at a remote node. The mechanism performs an ACK/NACK (acknowledge/non-acknowledge) operation at the target processor and returning an ACK signal or a NACK signal to the source processor across the network. This ACK/NACK signal wakes-up the source processor. The source processor sends interrupt data to the target processor if an ACK signal is received and aborts the interrupt transaction if a NACK signal is received.
摘要:
When a processor within a computer system performs a synchronization operation, the system interface within the node delays subsequent transactions from the processor until outstanding coherency activity is completed. Therefore, the computer system may employ asynchronous operations. The synchronization operations may be used when needed to guarantee global completion of one or more prior asynchronous operations. In one embodiment, the synchronization operation is placed into a queue within the system interface. When the synchronization operation reaches the head of the queue, it may be initiated within the system interface. The system interface further includes a request agent comprising multiple control units, each of which may concurrently service coherency activity with respect to a different transaction. Furthermore, the system interface includes a synchronization control vector register which stores a bit for each control unit. Upon initiation of the synchronization operation within the system interface, bits corresponding to those control units which are performing coherency activity (i.e. those which are not idle) are set while other bits are cleared. As each control unit returns to the idle state, the corresponding bit is cleared as well. Once all the bits within the synchronization control vector register are cleared, the coherency activity which was outstanding when the synchronization operation was initiated is complete. The synchronization operation may then be completed.
摘要:
A computer system includes multiple processing nodes, each of which is divided into subnodes. Transactions from a particular subnode are performed in the order presented by that subnode. Therefore, when a first transaction from the subnode is delayed to allow performance of coherency activity with other processing nodes, subsequent transactions from that subnode are delayed as well. Additionally, coherency activity for the subsequent transactions may be initiated in accordance with a prefetch method assigned to the subsequent transactions. In this manner, the delay associated with the ordering constraints of the system may be concurrently experienced with the delay associated with any coherency activity which may need to be performed in response to the subsequent transactions. In order to respect the ordering constraints imposed by the computer system, a system interface within the processing nodes employs an early completion policy for prefetch operations. If prefetch coherency activity for a transaction completes prior to coherency activity for another transaction from the same subnode, the early completion policy assigned to that transaction is enacted. In a drop policy, the data corresponding to the transaction is discarded. A write policy is also defined in which data received in response to the prefetch coherency activity is stored in the local memory. Lastly, a clear policy may be enforced in which the coherency activity is indicated to be complete.