摘要:
The disclosure is directed towards fault-tolerant methods, systems and architectures for data distribution. One method includes generating fault distribution tables. The table entries correspond to a copy of data records. The entry and copy are associated with a fault status, a node, and a group that are based on a position of the entry within the distribution table. The method also includes storing the copy of the data record that corresponds to the entry in a database that is included in a plurality of databases. In response to determining an unavailable node included in the plurality of nodes, the method determines a fault status, a node, and a group. The method provides an available node sequential access to data records that are stored in a particular database that is stored locally on the available node in a tree structure.
摘要:
Systems, methods, and computer program products for managing a consensus group in a distributed computing cluster, by determining that an instance of an authority module executing on a first node, of a consensus group of nodes in the distributed computing cluster, has failed; and adding, by an instance of the authority module on a second node of the consensus group, a new node to the consensus group to replace the first node. The new node is a node in the computing cluster that was not a member of the consensus group at the time the instance of the authority module executing on the first node is determined to have failed.
摘要:
One embodiment of the invention includes a system for performing intelligent disaster recovery. The system includes a processor and a memory. The memory stores a first monitor application that, when executed on the processor, performs an operation. The operation includes communicating with a second monitor application hosted at a secondary data center to determine an availability of one or more computer servers at a primary data center. The operation also includes upon reaching a consensus with the second monitor application that one or more computer servers at the primary data center are unavailable to process client requests, relative to both the first monitor application and the second monitor application, initiating a failover operation. Embodiments of the invention also include a method and a computer-readable medium for performing intelligent disaster recovery.
摘要:
Techniques are disclosed for managing a cluster of computing nodes following a division of the cluster into at least a first and second partition, where the cluster aggregates local storage resources of the nodes to provide an object store, and objects stored in the object store are divided into data components stored across the nodes. In accordance with one method, it is determined that a majority of data components comprising a first object are stored within nodes in the first partition. It is determined that a majority of data components comprising a second object are stored within nodes in the second partition. Configuration objects are permitted to be performed on the first object in the first partition while denying access to the first object from the second partition, and on the second object in the second partition while denying access to the second object from the first partition.
摘要:
Techniques for mobile clusters for collecting telemetry data and processing analytic tasks, are disclosed herein. The mobile cluster includes a processor, a plurality of data nodes and an analysis module. The data nodes receive and store a snapshot of at least a portion of data stored in a main Hadoop storage cluster and real-time acquired data received from a data capturing device. The analysis module is operatively coupled to the processor to process analytic tasks based on the snapshot and the real-time acquired data when the storage cluster is not connected to the main storage cluster.
摘要:
The invention relates to a method for operating a control network. It should be possible to perform the method reliably with relatively little complexity. According to the invention, a method for operating a control network (1) is suitable for this purpose, said control network having a single physical connection between a first control computer (ST1) and a second redundant control computer (ST2) by means of a data line network (2), to which several functionally important data processing devices (A, C, D, F, H, K, L) are connected. The data connection between the control computers (ST1, ST2) and the functionally important devices (A, C, D, F, H, K, L) is achieved by means of a redundant and diverse heartbeat, wherein the communication connection between the two control computers (ST1, ST2) is checked in order to start the operation of the control network (1). If the result of the check is positive, a master function is assigned to a control computer (ST1), or if the result of the check is negative, both control computers (ST1, ST2) connect the functionally important devices (A, C, D, F, H, K, L) to themselves according to a defined sequence. If a specified quantity of the functionally important devices (A, C, D, F, H, K, L) is connected to one of the two control computers (ST1), said control computer assumes the master function and the other control computer (ST2) assumes the standby function, or if the number of functionally important devices (A, C, D, F, H, K, L) connected to each of the two control computers (ST1, ST2) lies below the specified quantity, a signal is generated that signals a faulty state of the control network (1). The invention further relates to a control network.
摘要:
A method of managing a distributed storage space. The method comprises mapping a plurality of replica sets to a plurality of storage managing modules installed in a plurality of computing units, each of the plurality of storage managing modules manages access of at least one storage consumer application to replica data of at least one replica of a replica set from the plurality of replica sets, the replica data is stored in at least one drive of a respective the computing unit, allocating at least one time based credit to at least one of each storage managing module and the replica data, iteratively renewing the time based credit as long a failure of at least one of the storage managing module, and the at least one drive and the replica data is not detected plurality of storage managing.
摘要:
An "operate with missing region" feature of this disclosure allows the cluster to continue servicing reads for available regions even when some regions are missing. In particular, upon a given node failure condition, the cluster is placed in an effective read-only mode for all regions. The node failure condition typically is one where there has been a failure of an authoritative region copy and no backup copy is available. As used herein, "read-only" means that no client write or update requests will succeed while the cluster is in this state. Preferably, such requests are then re-tried. In this mode, all regions are only allowed to perform read operations. During the read-only state, the cluster continues to operate with missing regions, and missing regions are entered on the region map. The cluster then automatically recovers returning missing region(s), after which is leaves the read-only state.
摘要:
Disclosed is a distributed computing system that enables autonomous leader selection without relying on a particular server. The distributed computing system (S) is provided with: a leader candidate selection device (63) that, when communication is established with a majority of the initial total number of information processing devices, selects the aforementioned information processing device with the oldest accession time as a leader candidate information processing device, and transmits identification information thereof; and a leader recognition device (64) that investigates identification information of the aforementioned leader candidate information processing device and the transmitted identification information of the aforementioned leader candidate information processing device of the leader recognition device itself, and that, in the case that the information processing device that is the same as that recognized as the leader candidate information processing device is present among the aforementioned majority of the initial total number of information processing devices, recognizes said information processing device as a new leader.