摘要:
A fault-tolerant computer system comprises a plurality of processing nodes and a plurality of storage nodes interconnected by a network. The processing nodes perform processing operations in connection with user-generated processing requests. The processing nodes, in connection with processing a processing request, generate storage and retrieval requests for transmission to the storage node to enable storage of data thereon and retrieval of data therefrom. The storage nodes store data in at least one replicated partition group comprising a plurality of replicated partitions distributed across the storage nodes. A storage node, on receiving a retrieval request from a processing node provide the requested data to the processing node. In addition, on receiving a storage request from a processing node, a storage node initiates an update operation to update all of the replicated partitions in the replicated partition group. Following correction of a malfunction or failure of a storage node, partitions maintained by the malfunctioning or failed storage node can be recovered by use of the other members of the replicated partition group.