Confined recovery in a distributed computing system
Abstract:
Executing a confined recovery in a distributed system having a plurality of worker systems including a failed worker system at a current superstep. The confined recovery includes determining states of the partitions of the worker systems during the supersteps preceding the current superstep, and determining a recovery initiation superstep preceding the current superstep in which all messages for recovery initiation superstep are available. The recovery initiation superstep is determined responsive to determining the states of the partitions. Additionally, a recovery set of partitions is determined for which messages in supersteps after the recovery initiation superstep are not available. The worker systems having the partitions in the recovery set are instructed to execute the defined function for the partitions in the recovery set starting at the recovery initiation superstep to recover the lost exchanged messages.
Information query
Patent Agency Ranking
0/0