Method and apparatus for providing distributed checkpointing
Abstract:
Methods and apparatus presented herein provide distributed checkpointing in a multi-node system, such as a network of servers in a data center. When checkpointing of application state data is needed in a node, the methods and apparatus determine whether checkpoint memory space is available in the node for checkpointing the application state data. If not enough checkpoint memory space is available in the node, the methods and apparatus request and find additional checkpoint memory space from other nodes in the system. In this manner, the methods and apparatus can checkpoint the application state data into available checkpoint memory spaces distributed among a plurality of nodes. This allows for high bandwidth and low latency checkpointing operations in the multi-node system.
Public/Granted literature
Information query
Patent Agency Ranking
0/0