Abstract:
The present invention includes an approach to index tree structure changes which provides high concurrency while being usable with many recovery schemes and with many varieties of index trees. The present invention permits multiple concurrent structure changes. In addition, all update activity and structure change activity above the data level executes in short independent atomic actions which do not impede normal database activity. Only data node splitting executes in the context of a database transaction. This feature makes the approach usable with the diverse recovery mechanisms, while only impacting concurrency in a modest way. Even this impact can be avoided by re-packaging the atomic actions, at the cost of requiring more from the recovery system.
Abstract:
A request to modify an object in storage that is associated with one or more computing devices may be obtained, the storage organized based on a latch-free B-tree structure. A storage address of the object may be determined, based on accessing a mapping table that includes map indicators mapping logical object identifiers to physical storage addresses. A prepending of a first delta record to a prior object state of the object may be initiated, the first delta record indicating an object modification associated with the obtained request. Installation of a first state change associated with the object modification may be initiated via a first atomic operation on a mapping table entry that indicates the prior object state of the object. For example, the latch-free B-tree structure may include a B-tree like index structure over records as the objects, and logical page identifiers as the logical object identifiers.
Abstract:
A request to modify an object in storage that is associated with one or more computing devices may be obtained, the storage organized based on a latch-free B-tree structure. A storage address of the object may be determined, based on accessing a mapping table that includes map indicators mapping logical object identifiers to physical storage addresses. A prepending of a first delta record to a prior object state of the object may be initiated, the first delta record indicating an object modification associated with the obtained request. Installation of a first state change associated with the object modification may be initiated via a first atomic operation on a mapping table entry that indicates the prior object state of the object. For example, the latch-free B-tree structure may include a B-tree like index structure over records as the objects, and logical page identifiers as the logical object identifiers.
Abstract:
A data structure, added to a modified form of the Blink-tree data structure, tracks delete states for nodes. The index delete state (DX) indicates whether it is safe to directly access an index node without re-traversing the B-tree. The DX state is maintained globally, outside of the tree structure. The data delete state (DD) indicates whether it is safe to post an index term for a new leaf node. A DD state is maintained in each level 1 node for its leaf nodes. Delete states indicate whether a specific node has not been deleted, or whether it may have been deleted. Delete states are used to remove the necessity for atomic node splits and chains of latches for deletes, while not requiring retraversal. This property of not requiring a retraversal is exploited to simplify the tree modification operations.
Abstract translation:添加到B 链接 SUP>树数据结构的修改形式的数据结构跟踪节点的删除状态。 索引删除状态(D SUB> X)表示在不重新遍历B树的情况下直接访问索引节点是否安全。 在树结构之外,全局地维护D X>状态。 数据删除状态(D D SUB>)表示是否可以安全地为新的叶节点发布索引项。 在其叶节点的每个1级节点中保持D D D N状态。 删除状态表示特定节点是否未被删除,或者是否可能被删除。 删除状态用于消除用于删除的原子节点拆分和锁存链的必要性,而不需要重新进行穿越。 利用这种不需要重穿的属性来简化树的修改操作。
Abstract:
Persistent components are provided across both process and server failures, without the application programmer needing take actions for component recoverability. Application interactions with a stateful component are transparently intercepted and stably logged to persistent storage. A “virtual” component isolates an application from component failures, permitting the mapping of a component to an arbitrary “physical” component. Component failures are detected and masked from the application. A virtual component is re-mapped to a new physical component, and the operations required to recreate a component and reinstall state up to the point of the last logged interaction is replayed from the log automatically.
Abstract:
Lazy timestamping in a transaction time database is performed using volatile reference counting and checkpointing. Volatile reference counting is employed to provide a low cost way of garbage collecting persistent timestamp information about a transaction by identifying exactly when all record versions of a transaction are timestamped and the versions are persistent. A volatile timestamp (VTS) table is created in a volatile memory, and stores timestamp, reference count, transaction ID, and LSN information. Active portions of a persisted timestamp (PTS) table are stored in the VTS table to provide faster and more efficient timestamp processing via accesses to the VTS table information. The reference count information is stored only in the VTS table for faster access. When the reference count information decrements to zero, it is known that all record versions that were updates for a transaction were timestamped. A checkpointing component facilitates checkpoint processing for verifying that timestamped records have been written to the persistent database and that garbage collection of the PTS table can be performed for transaction entries with zero reference counts.
Abstract:
A method and system for increasing server cluster availability by requiring at a minimum only one node and a quorum replica set of replica members to form and operate a cluster. Replica members, independent from the nodes, maintain cluster operational data. A cluster operates when one node possesses a majority of replica members, which ensures that any new or surviving cluster includes consistent cluster operational data via at least one replica member from the immediately prior cluster. Arbitration provides exclusive ownership by one node of the replica members, including at cluster formation, and when the owning node fails. Arbitration uses a fast mutual exclusion algorithm and a reservation mechanism to challenge for and defend the exclusive reservation of each member. A quorum replica set algorithm brings members online and offline with data consistency, including updating unreconciled replica members, and ensures consistent read and update operations.