发明授权
US08407376B2 Recording a communication pattern and replaying messages in a parallel computing system 失效
记录通信模式并在并行计算系统中重播消息

Recording a communication pattern and replaying messages in a parallel computing system
摘要:
A parallel computer system includes a plurality of compute nodes. Each of the compute nodes includes at least one processor, at least one memory, and a direct memory address engine coupled to the at least one processor and the at least one memory. The system also includes a network interconnecting the plurality of compute nodes. The network operates a global message-passing application for performing communications across the network. Local instances of the global message-passing application operate at each of the compute nodes to carry out local processing operations independent of processing operations carried out at another one of the compute nodes. The direct memory address engines are configured to interact with the local instances of the global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of the memories. The local instances of the global message passing application are configured to record, in the injection FIFO in the corresponding one of the memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program. The local instances of the global message passing application are configured to replay the message descriptors during a subsequent iteration of the executing application program.
信息查询
0/0