Local rollback for fault-tolerance in parallel computing systems
    4.
    发明授权
    Local rollback for fault-tolerance in parallel computing systems 有权
    并行计算系统容错的局部回滚

    公开(公告)号:US08103910B2

    公开(公告)日:2012-01-24

    申请号:US12696780

    申请日:2010-01-29

    IPC分类号: G06F11/00

    CPC分类号: G06F15/17381 G06F9/30072

    摘要: A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

    摘要翻译: 控制逻辑设备在并行超级计算系统中执行本地回滚。 超级计算系统包括至少一个高速缓冲存储器设备。 控制逻辑设备确定本地回滚间隔。 控制逻辑器件在本地回滚间隔中运行至少一条指令。 控制逻辑设备评估在本地回滚间隔期间运行至少一条指令时是否发生不可恢复的条件。 控制逻辑器件检查本地回滚期间是否发生错误。 如果发生错误,并且在本地回滚间隔期间不发生不可恢复的条件,则控制逻辑设备将重新启动本地回滚间隔。

    MULTIPLE NODE REMOTE MESSAGING
    7.
    发明申请
    MULTIPLE NODE REMOTE MESSAGING 有权
    多个节点远程消息传递

    公开(公告)号:US20090006546A1

    公开(公告)日:2009-01-01

    申请号:US11768784

    申请日:2007-06-26

    IPC分类号: G06F15/16

    CPC分类号: G06F15/16

    摘要: A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

    摘要翻译: 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括:第一计算节点(A)将单个远程消息发送到远程第二计算节点(B),以便控制远程第二计算 节点(B)发送至少一个远程消息。 该方法包括各种步骤,包括在第一计算节点(A)处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符,用于控制远程第二计算节点(B)至少发送 一个远程消息,包括将第一消息描述符放在第一计算节点(A)的注入FIFO中,并将单个远程消息和至少一个远程消息描述符发送到第二计算节点(B)。

    Multiple node remote messaging
    8.
    发明授权
    Multiple node remote messaging 有权
    多节点远程消息传递

    公开(公告)号:US07788334B2

    公开(公告)日:2010-08-31

    申请号:US11768784

    申请日:2007-06-26

    IPC分类号: G06F15/167 G06F13/28

    CPC分类号: G06F15/16

    摘要: A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

    摘要翻译: 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括:第一计算节点(A)将单个远程消息发送到远程第二计算节点(B),以便控制远程第二计算 节点(B)发送至少一个远程消息。 该方法包括各种步骤,包括在第一计算节点(A)处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符,用于控制远程第二计算节点(B)至少发送 一个远程消息,包括将第一消息描述符放在第一计算节点(A)的注入FIFO中,并将单个远程消息和至少一个远程消息描述符发送到第二计算节点(B)。

    ATOMICITY: A MULTI-PRONGED APPROACH
    9.
    发明申请
    ATOMICITY: A MULTI-PRONGED APPROACH 审中-公开
    原理:多方面的方法

    公开(公告)号:US20110219215A1

    公开(公告)日:2011-09-08

    申请号:US13008546

    申请日:2011-01-18

    IPC分类号: G06F9/30

    CPC分类号: G06F9/524 G06F12/08

    摘要: In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions.

    摘要翻译: 在具有推测性执行的多处理器系统中,可以以几种方式逼近原子性。 一种方法是具有实现多种功能并保证完成的原子指令。 另一种方法是将代码块分组成一起成功或失败。 系统可以包含多种这样的方法。 在实施多种方法时,系统可以优先考虑其他方法。 当通过高速缓冲存储器中的目录查找完成冲突检测时,原子指令和原子性相关操作可以在该高速缓冲存储器中的高速缓存数据阵列访问流水线中实现。 该实现可以包括用于在原子指令内实现多个功能并且还用于级联原子指令的流水线的反馈。