摘要:
A method and apparatus for detecting split brain in a distributed system is provided. After determining that a rogue instance is no longer an active member of the cluster, a recovery instance detects activity associated with a redo log that is updated by the rogue instance to store log records that describe changes made by the rogue instance to data associated with the cluster.
摘要:
A method and system is provided for reducing delay to applications connected to a database server that guarantees no data loss during failure or disaster. After storing a log record persistently in a local primary log, the log writer returns control to the application which continues running concurrently with the database server sending the session's log records to a standby database. A separate back channel is used by the standby to communicate, out-of-band to the primary, the location of the last log record stored persistently to the standby log. An application waiting for a transaction to commit may wait until the transaction's commit record has been persisted.Also described is a technique for reducing application delay when there is contention between nodes of a multi-node cluster for updating the same block. The technique provides for an asynchronous ping protocol that guarantees zero data loss during failure or disaster.
摘要:
A method and system is provided for measuring, guaranteeing, and reducing replication data lag time between a primary system and one or more standby systems. Each standby system determines the lag time between the generation of a consistent version of data on the primary system and the time that the consistent version is applied on the standby system. Applications can request and be guaranteed to receive data from a standby system that is identical to the state on the primary system at the time of the query, or lag the primary state only by a maximum tolerable amount. A standby system may also publish a service that guarantees a maximum lag time and withdraw the service offer when the actual lag time exceeds the guaranteed lag time.Implications for implementing synchronous and asynchronous replication as well as performance optimizations are also discussed.