摘要:
A method and apparatus for I/O forwarding in a cache coherent shared disk computer system is provided. According to the method, a requesting node transmits a request for requested data to a managing node. The managing node receives the read request from the requesting node and grants a lock on the requested data. The managing node then forwards data that identifies the requested data to a disk controller. The disk controller receives the data that identifies the requested data from the managing node and reads a data item, based on the data that identifies the requested data, from a shared disk. After reading the data item from the shared disk, the disk controller transmits the data item to the requesting node. In one embodiment, an I/O destination handle is generated that identifies a read request and a buffer cache address to which the data item should be copied. The I/O destination handle is transmitted to the disk controller to facilitate transmission and processing of the data item from the disk controller to the requesting node. As a result of forwarding data that identifies the requested data directly from the managing node to the disk controller ("I/O forwarding"), the duration of a stall is reduced, contention on resources of the system is reduced and a context switch is eliminated.
摘要:
A method and apparatus are provided for transferring a resource from the cache of one database server to the cache of another database server without first writing the resource to disk. When a database server (Requestor) desires to modify a resource, the Requestor asks for the current version of the resource. The database server that has the current version (Holder) directly ships the current version to the Requestor. Upon shipping the version, the Holder loses permission to modify the resource, but continues to retain the resource in memory. When the retained version of the resource, or a later version thereof, is written to disk, the Holder can discard the retained version of the resource. Otherwise, the Holder does not discard the retained version. Using this technique, single-server failures are recovered without having to merge the recovery logs of the various database servers that had access to the resource.
摘要:
A method and apparatus are provided for transferring a resource from the cache of one database server to the cache of another database server without first writing the resource to disk. When a database server (Requestor) desires to modify a resource, the Requestor asks for the current version of the resource. The database server that has the current version (Holder) directly ships the current version to the Requestor. Upon shipping the version, the Holder loses permission to modify the resource, but continues to retain the resource in memory. When the retained version of the resource, or a later version thereof, is written to disk, the Holder can discard the retained version of the resource. Otherwise, the Holder does not discard the retained version. Using this technique, single-server failures are recovered without having to merge the recovery logs of the various database servers that had access to the resource.
摘要:
A method and system are provided for reconfiguring a multiple node system after an epoch change in a manner that reduces the overhead and system unavailability typically incurred during reconfiguration. A resource-to-master mapping is established using the combination of a resource-to-bucket hash function and a bucket-to-node hash function. The resource-to-bucket hash function is not changed in response to an epoch change. The bucket-to-node hash function does change in response to epoch changes. Techniques are disclosed for adjusting the dynamic bucket-to-node hash function after an epoch change in a manner that load balances among the new number of nodes in the system. Further, the changes to the bucket-to-node assignments are performed in a way that reduces the number of resources that have to be remastered. In one embodiment, only those resources that lose their masters during an epoch change are assigned new masters during an initial reconfiguration. Load balancing is then gradually achieved by migrating resources after the system has been made available. The old masters of resources forward access requests to new masters of resources once they have transferred the master resource objects for the requested resources. In addition, techniques are disclosed for migrating resources from a node in anticipation of a planned shutdown of the node.
摘要:
Techniques are provided for providing a data item to a transaction in a multi-versioning system in which the data item may exist on multiple versions of a data block, and were versioning is performed at the granularity of the data block. According to one aspect of the invention, the technique involves locating, within volatile memory, a first version of a data block that includes a first version of the data item. It is then determined whether the first version of the data item is useable by the transaction without respect to whether the first version of the data block is generally useable by the transaction. If the first version of the data item is usable by the transaction, then the data item is established as a candidate that can be provided to the transaction. Thus, the data item within a block may be considered a candidate to be provided to a transaction even when the version of the data block on which the data item resides would otherwise disqualify the data block from being seen by that transaction. If the first version of the data item is not usable by the transaction, then a version of the data item that is usable by the transaction is obtained from a second version of the data block that is different from the first version.
摘要:
Techniques are provided for determining which data item version to supply to a query. According to the techniques, the determination is made by associating a new field, which indicates the time a data item version was current, with each data item version; associating a new field with each query, which indicates the last change that the query must see made by the transaction to which the query belongs; and determining which data item version to use to answer the query based, in part, on a comparison between the values of the two new fields.
摘要:
Techniques are provided for managing caches in a system with multiple caches that may contain different copies of the same data item. Specifically, techniques are provided for coordinating the write-to-disk operations performed on such data items to ensure that older versions of the data item are not written over newer versions, and to reduce the amount of processing required to recover after a failure. Various approaches are provided in which a master is used to coordinate with the multiple caches to cause a data item to be written to persistent storage. Techniques are also provided for transferring data items and locks associated with the data items from one node to another.
摘要:
Techniques are provided for managing caches in a system with multiple caches that may contain different copies of the same data item. Specifically, techniques are provided for coordinating the write-to-disk operations performed on such data items to ensure that older versions of the data item are not written over newer versions, and to reduce the amount of processing required to recover after a failure. Various approaches are provided in which a master is used to coordinate with the multiple caches to cause a data item to be written to persistent storage. Techniques are also provided for managing checkpoints associated with the caches, where the checkpoints are used to determine the position at which to begin processing recovery logs in the event of a failure.
摘要:
A method and an apparatus for tracking of the dependencies between transactions is provided. Every time a data item is updated, a record is made of the transaction that updated the data item. Before another transaction locks a data item previously locked by the transaction, the entry is updated to indicate that the transaction committed and the commit time of the transaction. These entries are contained in a list head that is maintained on the same block as the data item, and a list tail that is stored separate from the data block that contains the data item. A depends-on time is maintained for each transaction. Whenever the transaction updates a data item, the depends-on time is set to the greater of the current depends-on time and the commit time of the most recently committed transaction that updated the version of the data item. Whether a transaction depends on a committed transaction is then determined based on a simple comparison between the depends-on time associated with the transaction and the commit time of the committed transaction.
摘要:
Techniques are provided for lock management. The techniques are based on an enhanced lock management system that generates a semantic response in response to lock requests for a resource. The semantic response communicates both the underlying cause blocking the request, and information that may be used by the requester to obtain notification of when the underlying cause should no longer lead to denial of the lock request. The semantic response may be generated by the master of the resource, who provides the semantic response to the local lock manager of the lock requester. The semantic response may be retained by the local lock manager so that the semantic response can be provided to subsequent lock requesters, without need for interacting with another lock manager on another node.