摘要:
A two-phase commit protocol for a distributed transaction processing system employs the presumed-commit configuration, with the exception that the new presumed-commit protocol coordinator needs to force-write only a "commit" log record for committed transactions, not the previous force writing of two log records. In order to provide information needed to allow the coordinator to answer inquiries from subordinate processes following a crash or loss of communications, a technique for circumscribing the set of indeterminate transactions is employed. The transactions are numbered in increasing order, identified by a transaction ID (T.sub.-- ID). The commit protocol is not allowed to begin unless the transaction ID of the committing transaction is within some preselected range of numbers starting from the highest-numbered stably-recorded transaction ID. That is, if the transaction number is too far removed from the highest TID of a stably stored log record (written to disk storage and able to survive a crash), then log records are written to disk until this condition hold. This may require writing to a disk log record for the committing transaction. Most commit transactions can, however, proceed without waiting for a disk write (forced log), and so performance is improved. A technique is disclosed for circumscribing the set of indeterminate transactions (not shown whether they committed, aborted or never started) so that information is small. It must be "permanently" retained, but the coordinator can store some of it in a cache (volatile memory) to answer inquiries.
摘要:
Systems and methods are described that facilitate idempotent execution of commands generated by a client for execution by a database server. Each command transmitted to the server includes a command ID generated by the client. The server attempts to execute each command and subsequently stores the command ID associated therewith in a repository along with an indication of whether the command executed successfully. When a new command is received by the server, it determines if the command ID associated therewith has already been stored in the repository. If the command ID associated with the new command has not already been stored in the repository, then the server executes the new command. If the command ID associated with the new command has already been stored in the repository and a previously-received command associated with the command ID has been executed successfully, then the server will not execute the new command.
摘要:
A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster includes consistent cluster operational data via at least one replica member from the immediately prior cluster. Arbitration provides exclusive ownership by one node of the replica members, including at cluster formation, and when the owning node fails. Arbitration uses a fast mutual exclusion algorithm and a reservation mechanism to challenge for and defend the exclusive reservation of each member. A quorum replica set algorithm brings members online and offline with data consistency, including updating unreconciled replica members, and ensures consistent read and update operations.
摘要:
A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members, independent from the nodes, maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster includes consistent cluster operational data via at least one replica member from the immediately prior cluster. Arbitration provides exclusive ownership by one node of the replica members, including at cluster formation, and when the owning node fails. Arbitration uses a fast mutual exclusion algorithm and a reservation mechanism to challenge for and defend the exclusive reservation of each member. A quorum replica set algorithm brings members online and offline with data consistency, including updating unreconciled replica members, and ensures consistent read and update operations.
摘要:
This invention concerns a database computer system and method for making applications recoverable from system crashes. The application state (i.e., address space) is treated as a single object which can be atomically flushed in a manner akin to flushing individual pages in database recovery techniques. To enable this monolithic treatment of the application, executions performed by the application are mapped to logical loggable operations which can be posted to the stable log. Any modifications to the application state are accumulated and the application state is periodically flushed to stable storage using an atomic procedure. The application recovery integrates with database recovery, and effectively eliminates or at least substantially reduces the need for check pointing applications. In addition, optimization techniques are described to make the read, write, and recovery phases more efficient.
摘要:
This invention concerns a database computer system and method for making applications recoverable from system crashes. The application state (i.e., address space) is treated as a single object which can be atomically flushed in a manner akin to flushing individual pages in database recovery techniques. To enable this monolithic treatment of the application, executions performed by the application are mapped to logical loggable operations which can be posted to the stable log. Any modifications to the application state are accumulated and the application state is flushed from time to time to stable storage using an atomic procedure. Applications are recovered by replaying the logged state transition operations, in the same manner that most database systems replay state transformation operations to recover database pages. This application recovery integrates with database recovery, and effectively eliminates or at least substantially reduces the need for check pointing applications. In addition, optimization techniques are described to make the read, write, and recovery phases more efficient.
摘要:
In a computer system, input strings to be translated are composed of characters selected from a first alphabet. According to a predetermined criterion, a list of sub-strings is selected from the input strings to form entries in a dictionary. The entries of the dictionary are arranged according to a collating order of the first alphabet. An interval including the sub-strings of the input strings is partitioned into an all-inclusive and disjoint set of ranges. The sub-strings of the interval are arranged according to the collating order of the first alphabet, and each sub-strings of a particular range has a common prefix, the common prefix selected from the list of sub-strings. A unique encoding is assigned to each common prefix, the corresponding set of unique encodings composed of characters selected from a second alphabet. The input strings are parsed, one at the time, into a plurality of tokens, each token corresponding to a sub-string selected from the dictionary. In an output string, there is placed for each token, a corresponding one of the set of unique encodings.
摘要:
The concurrency-control mechanisms in a database-management system achieves high concurrency by using a lock-mode set larger than that conventionally employed for multi-granularity locking. In a system of key-valued locking in which locks on key-value ranges are acquired separately from the locks on the key values with which they are associated, the IX lock mode conventionally acquired on a range by update, insert, and delete operations is replaced with three separate lock modes respectively associated with those operations and invoked by them for range locking. In key-valued-locking systems in which ranges are locked commonly with the key-values associated with them, the mode set is further expanded so that each mode represents a different combination of range and key-value locks.
摘要:
The present invention includes an approach to index tree structure changes which provides high concurrency while being usable with many recovery schemes and with many varieties of index trees. The present invention permits multiple concurrent structure changes. In addition, all update activity and structure change activity above the data level executes in short independent atomic actions which do not impede normal database activity. Only data node splitting executes in the context of a database transaction. This feature makes the approach usable with diverse recovery mechanisms, while only impacting concurrency in a modest way. Even this impact can be avoided by re-packaging the atomic actions, at the cost of requiring more from the recovery system.
摘要:
A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster includes consistent cluster operational data via at least one replica member from the immediately prior cluster. Arbitration provides exclusive ownership by one node of the replica members, including at cluster formation, and when the owning node fails. Arbitration uses a fast mutual exclusion algorithm and a reservation mechanism to challenge for and defend the exclusive reservation of each member. A quorum replica set algorithm brings members online and offline with data consistency, including updating unreconciled replica members, and ensures consistent read and update operations.