Method of performing checkpoint/restart of a parallel program
    1.
    发明授权
    Method of performing checkpoint/restart of a parallel program 失效
    执行并行程序的检查点/重启的方法

    公开(公告)号:US06393583B1

    公开(公告)日:2002-05-21

    申请号:US09181985

    申请日:1998-10-29

    IPC分类号: H02H305

    CPC分类号: G06F11/1438 G06F11/1458

    摘要: A checkpoint of a parallel program is taken in order to provide a consistent state of the program in the event the program is to be restarted. Each process of the parallel program is responsible for taking its own checkpoint, however, the timing of when the checkpoint is to be taken by each process is the responsibility of a coordinating process. During the checkpointing, various data is written to a checkpoint file. This data includes, for instance, in-transit message data, a data section, file offsets, signal state, executable information, stack contents and register contents. The checkpoint file can be stored either in local or global storage. When it is stored in global storage, migration of the program is facilitated. When a parallel program is to be restarted, each process of the program initiates its own restart. The restart logic restores the process to the state at which the checkpoint was taken.

    摘要翻译: 采取并行程序的检查点,以便在程序重新启动的情况下提供程序的一致状态。 并行程序的每个过程都负责自己的检查点,但是每个进程检查点何时执行的时间是协调过程的责任。 在检查点期间,将各种数据写入检查点文件。 该数据包括例如转接消息数据,数据部分,文件偏移,信号状态,可执行信息,堆栈内容和寄存器内容。 检查点文件可以存储在本地或全局存储中。 当它存储在全局存储中时,便于程序的迁移。 当并行程序重新启动时,程序的每个进程都会自动重启。 重新启动逻辑将进程恢复到执行检查点的状态。

    System of performing checkpoint/restart of a parallel program
    2.
    发明授权
    System of performing checkpoint/restart of a parallel program 有权
    执行并行程序检查点/重新启动的系统

    公开(公告)号:US06401216B1

    公开(公告)日:2002-06-04

    申请号:US09181981

    申请日:1998-10-29

    IPC分类号: G06F1100

    CPC分类号: G06F11/1458

    摘要: A checkpoint of a parallel program is taken in order to provide a consistent state of the program in the event the program is to be restarted. Each process of the parallel program is responsible for taking its own checkpoint, however, the timing of when the checkpoint is to be taken by each process is the responsibility of a coordinating process. During the checkpointing, various data is written to a checkpoint file. This data includes, for instance, in-transit message data, a data section, file offsets, signal state, executable information, stack contents and register contents. The checkpoint file can be stored either in local or global storage. When it is stored in global storage, migration of the program is facilitated. When a parallel program is to be restarted, each process of the program initiates its own restart. The restart logic restores the process to the state at which the checkpoint was taken.

    摘要翻译: 采取并行程序的检查点,以便在程序重新启动的情况下提供程序的一致状态。 并行程序的每个过程都负责自己的检查点,但是每个进程检查点何时执行的时间是协调过程的责任。 在检查点期间,将各种数据写入检查点文件。 该数据包括例如转接消息数据,数据部分,文件偏移,信号状态,可执行信息,堆栈内容和寄存器内容。 检查点文件可以存储在本地或全局存储中。 当它存储在全局存储中时,便于程序的迁移。 当并行程序重新启动时,程序的每个进程都会自动重启。 重新启动逻辑将进程恢复到执行检查点的状态。

    Program products for performing checkpoint/restart of a parallel program
    3.
    发明授权
    Program products for performing checkpoint/restart of a parallel program 失效
    执行并行程序检查点/重新启动的程序产品

    公开(公告)号:US06338147B1

    公开(公告)日:2002-01-08

    申请号:US09182555

    申请日:1998-10-29

    IPC分类号: G06F1100

    CPC分类号: G06F11/1458

    摘要: A checkpoint of a parallel program is taken in order to provide a consistent state of the program in the event the program is to be restarted. Each process of the parallel program is responsible for taking its own checkpoint, however, the timing of when the checkpoint is to be taken by each process is the responsibility of a coordinating process. During the checkpointing, various data is written to a checkpoint file. This data includes, for instance, in-transit message data, a data section, file offsets, signal state, executable information, stack contents and register contents. The checkpoint file can be stored either in local or global storage. When it is stored in global storage, migration of the program is facilitated. When a parallel program is to be restarted, each process of the program initiates its own restart. The restart logic restores the process to the state at which the checkpoint was taken.

    摘要翻译: 采取并行程序的检查点,以便在程序重新启动的情况下提供程序的一致状态。 并行程序的每个过程都负责自己的检查点,但是每个进程检查点何时执行的时间是协调过程的责任。 在检查点期间,各种数据被写入检查点文件。 该数据包括例如转接消息数据,数据部分,文件偏移,信号状态,可执行信息,堆栈内容和寄存器内容。 检查点文件可以存储在本地或全局存储中。 当它存储在全局存储中时,便于程序的迁移。 当并行程序重新启动时,程序的每个进程都会自动重启。 重新启动逻辑将进程恢复到执行检查点的状态。