Abstract:
A cluster system comprises a plurality of nodes that provides data-access service to a shared storage, each node having at least one failover partner node for taking over services of a node if the node fails. Each node may produce write logs for the shared storage and periodically send write logs at predetermined time intervals to a global device which stores write logs from each node. The global device may detect failure of a node by monitoring time intervals of when write logs are received from each node. Upon detection of a node failure, the global device may provide the write logs of the failed node to one or more partner nodes for performing the write logs on the shared storage. Write logs may be transmitted only between nodes and the global device to reduce data exchanges between nodes and conserving I/O resources of the nodes.
Abstract:
A cluster system comprises a plurality of nodes that provides data-access service to a shared storage, each node having at least one failover partner node for taking over services of a node if the node fails. Each node may produce write logs for the shared storage and periodically send write logs at predetermined time intervals to a global device which stores write logs from each node. The global device may detect failure of a node by monitoring time intervals of when write logs are received from each node. Upon detection of a node failure, the global device may provide the write logs of the failed node to one or more partner nodes for performing the write logs on the shared storage. Write logs may be transmitted only between nodes and the global device to reduce data exchanges between nodes and conserving I/O resources of the nodes.
Abstract:
A cluster system comprises a plurality of nodes that provides data-access service to a shared storage, each node having at least one failover partner node for taking over services of a node if the node fails. Each node may produce write logs for the shared storage and periodically send write logs at predetermined time intervals to a global device which stores write logs from each node. The global device may detect failure of a node by monitoring time intervals of when write logs are received from each node. Upon detection of a node failure, the global device may provide the write logs of the failed node to one or more partner nodes for performing the write logs on the shared storage. Write logs may be transmitted only between nodes and the global device to reduce data exchanges between nodes and conserving I/O resources of the nodes.
Abstract:
A disk-less quorum device in a clustered storage system includes non-volatile memory to store status information regarding the cluster and each storage controller in the cluster. The quorum device maintains a bitmap, shared by the controllers in the cluster, in the non-volatile memory. The bitmap indicates the status of a write operation to any data block or parity block. A “dirty” data unit in the bitmap indicates that a write operation has been submitted but is not yet finished. Upon submitting a write request (to update a data block or a parity block) to the storage facility, a controller sets the corresponding data unit “dirty” in the bitmap. After receiving an acknowledgement from the storage facility indicating that the operation has been completed, the controller clears the corresponding data unit. If a controller fails during a write operation, another controller can use the bitmap to re-establish data consistency.