摘要:
A virtual machine monitor (VMM) is configured to enforce deterministic execution of virtual machines in a multiprocessor machine. The VMM is configured to ensure that any communication by physical processors via shared memory is deterministic. When such VMMs are implemented in a distributed environment of multiprocessor machines coupled via a logical communication link, non-deterministic server applications running on virtual machines using the VMM may be replicated.
摘要:
A distributed computing system can be operated in a fault tolerant manner using a set of computing devices. A set of computing devices can tolerate a number of failures by implementing identical replicas of a state machine and selecting proposals. The set of computing devices participating in the distributed computing system by hosting replicas can be modified by adding or removing a computing device from the set, or by specifying particular computing devices for participation. Changing the participating computing devices in the set increases fault tolerance by replacing defective devices with operational devices, or by increasing the amount of redundancy in the system.
摘要:
Synchronized devices comprising a distributed system attempt to agree on a compatible sequence of commands to execute. Each device in the distributed system may act as a proposer, acceptor, or a learner. Each proposer proposes a command for each device to execute. The acceptors either accept or reject the proposed commands. The learners keep track of the proposed commands and determine, using a transactional substrate, whether the acceptors have a accepted sequences of commands that commute with respect to one another. Once the learners have determined that a quorum of acceptors have accepted sequences of commands that commute with respect to one another the accepted commands are executed by each device in the distributed system.
摘要:
A replicated state machine with N replica servers may be configured to tolerate a count of F faults. A first operation (of a first ordering type) executes when a first quorum of correctly functioning replicas is available. A second operation (also of the first operation type) executes when a second quorum of correctly functioning replicas is available. A third operation (of a second ordering type) executes when a third quorum of correctly functioning replicas are available. The operations are executed by the replicated state machine such that: (1) the replicated state machine does not guarantee operational ordering between the first operation and the second operation; (2) the replicated state machine guarantees ordering between the first operation and the third operation; and (3) the replicated state machine guarantees ordering between the second operation and the third operation.
摘要:
A mechanism that enables a nondeterministic client-server application to be run as a replicated state machine without requiring the application to be modified. A replicated state machine substrate is utilized to coordinate the execution of multiple virtual machine monitors, each of which runs an identical copy of an operating system and server application. The virtual machine monitors each act as deterministic state machines, virtualizing state machine characteristics and behaviors.
摘要:
Disclosed are “black-box leases” that protect information consistency and that allow for information sharing in a distributed file system while hiding from a client information about other clients' use of the file system. This information hiding also allows greater concurrency because changes to the file system are permitted as long as they do not affect the leases as observed by the clients. For each data field protected by a black-box lease, a client has a two-value data structure: SelfValue represents the client's intended use of the data field, and OtherValue is an aggregation of the other clients' intended uses of that data field. Knowing only this aggregate OtherValue, but without knowing any specifics of the other clients' usage patterns, the client knows how it may use the data field without adversely affecting the consistency of data in the distributed file system.
摘要:
A delta pager maintains a database with atomic, isolated transactions. When a transaction seeks to make changes to the database, the delta pager stores the changes in write buffers, and applies the changes when intervening transactions do not literally or substantively change the state of the database relied upon by the transaction. The delta pager applies the changes to commit the transaction by conjoining the write buffers with the current state of the database to form a new data structure representing the state of the database. The delta pager coalesces write buffers to maintain efficiency, subject to snapshots the delta pager respects to preserve selected states of the database. The delta pager makes selected sections of the database durable by moving selected data to a durable store. The delta pager also provides cache objects between the durable store and current transactions to promote efficient access to data.
摘要:
Scalable leases reduce latency and reduce the burden on a server in managing data leases. Instead of processing individual lease requests for clients seeking access to the same data, scalable leases provide for blanket leases that all of the clients can use to access the selected data. Leases may be encrypted or selectively distributed to restrict access to the data. Moreover, distributed data structures may be used to cache leases at identifiable nodes in a network to offload work from the server without all clients in the network having to cache all blanket leases issued. Thresholds for issuing blanket leases may be determined or adjusted by considerations such as demand for the selected data and server workload. Similarly, leases may be renewed on the basis of demand for selected data, conflicting lease requests, and other factors. Scalable leases may be issued for read leases and controlled write leases.
摘要:
Non-mutating tree-structured file identifiers are used to identify files stored in a file system. Each of multiple files in the file system has a corresponding non-mutating file identifier, and these file identifiers are assigned and maintained using a tree structure.
摘要:
The described implementations relate to efficient scheduling of transactions and tasks. A memory location, address, or variable previously accessed by a blocked entity is observed periodically to determine an appropriate time to wake and retry the blocked entity. If the previous accessed memory location, address or variable changes state, a scheduler wakes the blocked entity and the blocked entity retries processing. A doubly-indexed data structure of blocked entities and memory locations associated with the blocked entities may be used to efficiently determine when a retrying execution would be profitable.