Fault Tolerant System for Execution of Parallel Jobs
    1.
    发明申请
    Fault Tolerant System for Execution of Parallel Jobs 有权
    用于执行并行作业的容错系统

    公开(公告)号:US20080077925A1

    公开(公告)日:2008-03-27

    申请号:US11535083

    申请日:2006-09-26

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843

    摘要: The present invention provides a fault tolerant system and method for parallel job execution. In the proposed solution the job state and the state transition control are decoupled. The job execution infrastructure maintains the state information for all the executing jobs, and the job control units, one per-job, control the state transitions of their jobs. Due to the stateless nature of the control units, the system and method allow jobs to continue uninterrupted execution even when the corresponding control units fail.

    摘要翻译: 本发明提供了用于并行作业执行的容错系统和方法。 在提出的解决方案中,作业状态和状态转换控制被解耦。 作业执行基础架构维护所有执行作业的状态信息,作业控制单元(每个作业一个)控制其作业的状态转换。 由于控制单元的无状态,即使相应的控制单元出现故障,系统和方法也允许作业继续执行不间断的执行。

    Fault tolerant system for execution of parallel jobs
    2.
    发明授权
    Fault tolerant system for execution of parallel jobs 有权
    用于执行并行作业的容错系统

    公开(公告)号:US08291419B2

    公开(公告)日:2012-10-16

    申请号:US11535083

    申请日:2006-09-26

    CPC分类号: G06F9/4843

    摘要: The present invention provides a fault tolerant system and method for parallel job execution. In the proposed solution the job state and the state transition control are decoupled. The job execution infrastructure maintains the state information for all the executing jobs, and the job control units, one per-job, control the state transitions of their jobs. Due to the stateless nature of the control units, the system and method allow jobs to continue uninterrupted execution even when the corresponding control units fail.

    摘要翻译: 本发明提供了用于并行作业执行的容错系统和方法。 在提出的解决方案中,作业状态和状态转换控制被解耦。 作业执行基础架构维护所有执行作业的状态信息,作业控制单元(每个作业一个)控制其作业的状态转换。 由于控制单元的无状态,即使相应的控制单元出现故障,系统和方法也允许作业继续执行不间断的执行。