摘要:
Cluster management software comprises a plurality of cluster agents, with each cluster agent associated with an HPC node including an integrated fabric and the cluster agent operable to determine a status of the associated HPC node. The software further includes a cluster management engine communicably coupled with the plurality of the HPC nodes and operable to execute an HPC job using a dynamically allocated subset of the plurality of HPC nodes based on the determined status of the plurality of HPC nodes.
摘要:
Provided is a technique for allocating resources. Reserved resources are allocated to one or more depth levels, wherein the reserved resources form one or more reserved pools. Upon receiving a request for allocation of resources, a depth level from which to allocate resources is determined. A reserved pool is allocated from the determined depth level.
摘要:
A system and method are described for managing a plurality of application servers. In one embodiment, the application servers are organized into groups referred to as “instances.” Each instance includes a group of redundant application servers and a dispatcher for distributing service requests to each of the application servers. A group of instances may be organized as a “cluster.” Each server includes a configuration manager to facilitate changes to configuration information within the cluster. The configuration manager may include a configuration cache and monitor its consistency with respect to other instances within the cluster.
摘要:
Program product for an application programming interface that unifies a plurality of mechanisms into a single framework. The interface includes a mechanism for communicating between members of a process group of related processes, and a mechanism for synchronizing the related processes of the process group. Additionally, the application programming interface may include mechanisms for managing membership of the process group or a processor group of processors, and/or a mechanism for controlling a group state value for the process group.
摘要:
The present invention clones configuration information onto a device joining a cluster. A Configuration Acquisition System (CAS) component, which, using a list of attributes to be cloned, connects to a cluster member, interacts with the cluster member to retrieve all the attributes, reconciles the values of the attributes from the cluster member with the values of the attributes in its own configuration and applies the reconciled configuration to its Configuration Subsystem.
摘要:
The present invention is a method and system of load balancing in a group of one or more servers connected to one or more subnetworks. Two or more independent servers are bound into a group, with one of the servers elected to serve as a leader. The leader acts as a load balancer for the group while the remaining servers act as slaves. This functionality eliminates the need for one or more dedicated load balancing devices and lowers the hardware requirements necessary for performing such load balancing.
摘要:
Events of interest are detected in order to manage a high availability framework. In a framework in which a plurality of components are executing, the components are periodically polled to detect occurrence of the event of interest. A monitor is also established for one or more of the components. After the first component causes the event of interest to occur, the monitor communicating the event of interest to the framework without waiting for the framework to poll the first component.
摘要:
A composite resource is established that includes a plurality of members. Each of the members is capable of providing a comparable service. A coordinator monitors a state of each member of the composite resource. A component requests the service from the coordinator. The coordinator arranges for the service to be provided to the component by a particular member of the composite resource. When the particular member ceases to be active, the service is automatically provided to the component by another member in the composite resource. A state of the composite resource is maintained independently of the state of each member in the composite resource.
摘要:
A technique for organizing a plurality of computers such that message broadcast, content searching, and computer identification of the entire collection or a subset of the entire collection may be performed quickly without the use of a controlling computer. The technique describes the creation, operation, and maintenance of a connection scheme by which each computer in the collection appears to be the top level of a hierarchical array. The maintenance of this hierarchical connection scheme allows one to many communications throughout the collection of computers to scale geometrically rather than linearly.
摘要:
An apparatus, clustered computer system, program product and method utilize a unique prepare operation in connection with a resource action to effectively nulllock outnull missing or inactive cluster entities such as nodes and cluster objects from rejoining a clustered computer system subsequent to the resource action. The prepare operation includes the modification of one or more cluster configuration parameters associated with a plurality of entities in a clustered computer system, such that any such cluster entity that is active during the prepare operation accepts the modifications, while any such cluster entity that is inactive during the prepare operation does not accept the modifications. By modifying cluster configuration parameters for active cluster entities, attempts by previously-inactive cluster entities to activate or rejoin clustering subsequent to resource actions will generally fail due to an incorrect or stale cluster configuration parameters for such entities, and as a result, such entities will be effectively blocked from being accepted into the clustered computer system.