摘要:
A system and method for modifying execution scripts associated with a job scheduler may include monitoring for the execution of a task to determine when the task has failed. Details of the failed task may be identified and used to attempt recovery from the task failure. After initiating any recovery tasks, execution of the recovery tasks may be monitored, and one or more supplementary recovery tasks may be identified and executed, or the original task may be rerun at an appropriate execution point based on the initial point of failure. Thus, when a task has failed, an iterative process may begin where various effects of the failed task are attempted to be rolled back, and depending on the success of the rollback, the initial task can be rerun at the point of failure, or further recovery tasks may be executed.
摘要:
A system and method for modifying execution scripts associated with a job scheduler may include monitoring for the execution of a task to determine when the task has failed. Details of the failed task may be identified and used to attempt recovery from the task failure. After initiating any recovery tasks, execution of the recovery tasks may be monitored, and one or more supplementary recovery tasks may be identified and executed, or the original task may be rerun at an appropriate execution point based on the initial point of failure. Thus, when a task has failed, an iterative process may begin where various effects of the failed task are attempted to be rolled back, and depending on the success of the rollback, the initial task can be rerun at the point of failure, or further recovery tasks may be executed.