摘要:
Data stored within a cluster may be distributed among nodes each storing a portion of the data. The data may be replicated wherein different nodes store copies of the same portion of the data. In response to detecting the failure of a node, the cluster may initiate a timeout period. If the node remains failed throughout the timeout period, the cluster may copy the portion of the data stored on the failed node onto one or more other nodes of the cluster. If the node returns to the cluster during the timeout period, the cluster may maintain the copy of the data on the previously failed node without copying the portion of the data stored on the failed node onto any other nodes. By delaying self-healing of the cluster for the timeout period, an unbalanced data distribution may be avoided in cases where a failed node quickly rejoins the cluster.
摘要:
A distributed system provides for separate management of dynamic cluster membership and distributed data. Nodes of the distributed system may include a state manager and a topology manager. A state manager handles data access from the cluster. A topology manager handles changes to the dynamic cluster topology. The topology manager enables operation of the state manager by handling topology changes, such as new nodes to join the cluster and node members to exit the cluster. A topology manager may follow a static topology description when handling cluster topology changes. Data replication and recovery functions may be implemented, for example to provide high availability.
摘要:
A cluster topology self-healing process is performed in order to replicate a data set stored on a failed node from a first node storing another copy of the data set to a second non-failed node. The self-healing process is performed by: locking one of several domains included in the data set, where locking that domain does not lock any of the other domains in the data set; storing data sent from the first node to the second node in the domain; and releasing the domain. This process of locking, storing, and releasing is repeated for each other domain in the data set. Each domain may be locked for significantly less time than it takes to copy the entire data set. Accordingly, client access requests targeting a locked domain will be delayed for less time than if the entire data set is locked during the self-healing process.
摘要:
Entities within a cluster are uniquely identified with a node ID and an engine ID. The node ID uniquely identifies a node within a cluster of nodes and the engine ID uniquely identifies one of several engines included in the node. Entities may be further identified with a cluster ID, an engine type ID, and/or a virtual server ID. At least some of these IDs may be included in communications received from clients and used to route the communications to the cluster entity identified by the included IDs.
摘要:
A system may include a client and a distributed data manager coupled to the client. The distributed data manager may include a data store storing a data object that includes several sub-elements. The client is configured to update a portion of the data object by sending a message to the distributed data manager. The message specifies one of the sub-elements of the data object to be updated and includes a new value of that sub-element but does not include a new value of the entire data object. The distributed data manager is configured to perform updates to the data object in the data store dependent on which of the sub-elements of the data object are specified by the client.
摘要:
A distributed system provides for separate management of dynamic cluster membership and distributed data. Nodes of the distributed system may include a state manager and a topology manager. A state manager handles data access from the cluster. A topology manager handles changes to the dynamic cluster topology. The topology manager enables operation of the state manager by handling topology changes, such as new nodes to join the cluster and node members to exit the cluster. A topology manager may follow a static topology description when handling cluster topology changes. Data replication and recovery functions may be implemented, for example to provide high availability.
摘要:
A distributed system provides for separate management of dynamic cluster membership and distributed data. Nodes of the distributed system may include a state manager and a topology manager. A state manager handles data access from the cluster. A topology manager handles changes to the dynamic cluster topology. The topology manager enables operation of the state manager by handling topology changes, such as new nodes to join the cluster and node members to exit the cluster. A topology manager may follow a static topology description when handling cluster topology changes. Data replication and recovery functions may be implemented, for example to provide high availability.
摘要:
A system and method for controlling access to data in a distributed computer system. Distributed Token Manager (DTM) is a system-level service that coordinates read/write access of data objects (tokens) in a multi-process and multi-threaded environment. The DTM ensures that at any given time either: 1) One or more client processes or threads currently have read access rights to the data object, and no client processes or threads currently have write access rights to the data object; or 2) One client process or thread currently has write access to the data object and no other client processes or threads currently have read or write access rights to the data object. DTM also ensures that such coordination works smoothly even in the case of process/machine/network failure.
摘要:
A system and method for enabling failover in an application server cluster. A “primary” application server computer in the cluster may provide a service or data necessary for other application server computers in the cluster to operate. In addition to the primary application server computer, one or more of the other application server computers may be designated as “backup” application server computers. Each backup application server may backup the processing information managed by the primary application server. When the primary application server itself becomes unavailable (e.g., due to a failure of the computer system or network), one or more of the backup application servers may be promoted to the role of primary application server.
摘要:
A system and method for controlling access to data in a distributed computer system. Distributed Token Manager (DTM) is a system-level service that coordinates read/write access of data objects (tokens) in a multi-process and multi-threaded environment. The DTM may support a transactional model such that write operations to a data object performed by a client process or thread can be either committed or rolled back.