摘要:
A method and systems for recovering from a failure in a virtual machine are provided. In accordance with one embodiment of the present disclosure, a method for recovering from failures in a virtual machine is provided. The method may include, in a first physical host having a host operating system and a virtual machine running on the host operating system, monitoring one or more parameters associated with a program running on the virtual machine, each parameter having a predetermined acceptable range. The method may further include determining if the one or more parameters are within their respective predetermined acceptable ranges. In response to determining that the one or more parameters associated with the program running on the virtual machine are not within their respective predetermined acceptable ranges, a management module may cause the application running on the virtual machine to be restarted.
摘要:
A method, system and software for allocating information handling system resources in response to cluster fail-over events are disclosed. In operation, the method provides for the calculation of a performance ratio between a failing node and a fail-over node and the transformation of an application calendar schedule from the failing node into a new application calendar schedule for the fail-over node. Before implementing the new application calendar schedule for the failing-over application on the fail-over node, the method verifies that the fail-over node includes sufficient resources to process its existing calendar schedule as well as the new application calendar schedule for the failing-over application. A resource negotiation algorithm may be applied to one or more of the calendar schedules to prevent application starvation in the event the fail-over node does not include sufficient resources to process the failing-over application calendar schedule as well as its existing application calendar schedules.
摘要:
A system and method is disclosed for managing the serving of read and write commands in a computer cluster system having redundant storage. A plurality of database servers is included in the computer cluster network to serve read and write commands from the database clients of the network. One of the database servers is configured to handle both read commands and write commands. The remainder of the database servers are configured to handle only read commands. The database of the computer system includes a redundant storage subsystem that involves the use of mirrored disks associated with each of the database servers.
摘要:
A method and systems for recovering from a failure in a virtual machine are provided. In accordance with one embodiment of the present disclosure, a method for recovering from failures in a virtual machine is provided. The method may include, in a first physical host having a host operating system and a virtual machine running on the host operating system, monitoring one or more parameters associated with a program running on the virtual machine, each parameter having a predetermined acceptable range. The method may further include determining if the one or more parameters are within their respective predetermined acceptable ranges. In response to determining that the one or more parameters associated with the program running on the virtual machine are not within their respective predetermined acceptable ranges, a management module may cause the application running on the virtual machine to be restarted.
摘要:
According to various illustrative embodiments of the present invention, a method for adaptive cluster input/output control includes starting a nonessential input/output operation using a first controller in a first node of a cluster, informing at least a second controller in a second node of the cluster about starting the nonessential input/output operation, and increasing the nonessential input/output operation by a predetermined utilization percentage. The method also includes waiting for a predetermined amount of time, determining whether the nonessential input/output operation has been completed, and determining whether the at least the second controller in the second node has substantially decreased performance. The method also includes decreasing the nonessential input/output operation by the predetermined utilization percentage if the nonessential input/output operation utilization percentage is greater than the predetermined utilization percentage and if the at least the second controller in the second node has substantially decreased performance, and informing the at least the second controller in the second node of the cluster about the completion of the nonessential input/output operation if the nonessential input/output operation has been completed.
摘要:
A SAN-based cluster backup system and method are provided. The system and method are automated, do not use a LAN for backup data, and are made aware of application failover events. The system and method are composed of two main components: a backup service, and a primary coordinator. The backup service performs the backup of the applications that are hosted on a particular node. The backup service periodically checkpoints the state of the backup job and communicates the status to the primary coordinator. The primary coordinator controls all backup operations in the cluster. The user submits backup jobs for the applications through the primary coordinator. If a node fails during a backup operation, the primary coordinator can ensure that the failed backup job can be resumed from the last checkpoint on the failed-over node. In this way, repetitive backups can be avoided, thereby increasing efficiency.