摘要:
Techniques are provided for managing a resource in a High Availability (HA) system. The techniques involve incrementing a count when a particular type of remedial action is performed on a resource, so that the count that reflects how often the particular type of remedial action has been performed for the resource. When it is determined that the resource has been in stable operation, the count is automatically reduced. After a failure, the count is used to determine whether to attempt to perform the particular type of remedial action on the resource. Examples of remedial actions include restarting the resource, and relocating the resource to another node of a cluster. By using the count, the system insures that a faulty resource does not get constantly “bounced”. By reducing the count when a resource has become stable, there is less likelihood that failure of otherwise stable resources will require manual intervention.
摘要:
Techniques are provided for managing a resource in a High Availability (HA) system. The techniques involve incrementing a count when a particular type of remedial action is performed on a resource, so that the count that reflects how often the particular type of remedial action has been performed for the resource. When it is determined that the resource has been in stable operation, the count is automatically reduced. After a failure, the count is used to determine whether to attempt to perform the particular type of remedial action on the resource. Examples of remedial actions include restarting the resource, and relocating the resource to another node of a cluster. By using the count, the system insures that a faulty resource does not get constantly “bounced”. By reducing the count when a resource has become stable, there is less likelihood that failure of otherwise stable resources will require manual intervention.
摘要:
A clusterware manager on a cluster of nodes interprets a resource profile. The resource profile defines resource profile attributes. The attributes include at least one attribute that defines a cluster dependency based on resource type. The attribute does not identify any particular resource of that resource type. Dependencies between resources are managed based on the attribute that specifies the cluster dependency.
摘要:
A method for a self-testing clusterware agent is provided. A clusterware agent that includes clusterware-side components and application-side components is configured to interface between a cluster manager and an application. The application-side components are invoked by clusterware-side components via an application programming interface, or API that includes API functions that are invocable by a cluster manager. Without any cluster manager invoking the clusterware agent, one or more of the API functions are invoked.
摘要:
A clusterware manager on a cluster of nodes interprets a resource profile. The resource profile defines resource profile attributes. The attributes include at least one attribute that defines a cluster dependency based on resource type. The attribute does not identify any particular resource of that resource type. Dependencies between resources are managed based on the attribute that specifies the cluster dependency.
摘要:
A cluster manager is configured to manage a plurality of copies of a mid-tier database as a mid-tier database cluster. The cluster manager may concurrently manage a backend database system. The cluster manager is configured to monitor for and react to failures of mid-tier database nodes. The cluster manager may react to a mid-tier database failure by, for example, assigning a new active node, creating a new standby node, creating new copies of the mid-tier databases, implementing new replication or backup schemes, reassigning the node's virtual address to another node, or relocating applications that were directly linked to the mid-tier database to another host. Each node or an associated agent may configure the cluster manager to behave in this fashion during initialization, based on common cluster configuration information. Each copy of the mid-tier database may be, for example, a memory resident database. Thus, a node must reload the entire database into memory to recover a copy of the database.
摘要:
A cluster manager is configured to manage a plurality of copies of a mid-tier database as a mid-tier database cluster. The cluster manager may concurrently manage a backend database system. The cluster manager is configured to monitor for and react to failures of mid-tier database nodes. The cluster manager may react to a mid-tier database failure by, for example, assigning a new active node, creating a new standby node, creating new copies of the mid-tier databases, implementing new replication or backup schemes, reassigning the node's virtual address to another node, or relocating applications that were directly linked to the mid-tier database to another host. Each node or an associated agent may configure the cluster manager to behave in this fashion during initialization, based on common cluster configuration information. Each copy of the mid-tier database may be, for example, a memory resident database. Thus, a node must reload the entire database into memory to recover a copy of the database.
摘要:
A method for a self-testing clusterware agent is provided. A clusterware agent that includes clusterware-side components and application-side components is configured to interface between a cluster manager and an application. The application-side components are invoked by clusterware-side components via an application programming interface, or API that includes API functions that are invocable by a cluster manager. Without any cluster manager invoking the clusterware agent, one or more of the API functions are invoked.
摘要:
A method and mechanism for failing over applications in a clustered computing system is provided. In an embodiment, the methodology is implemented by a high-availability failover mechanism. Upon detecting a failure of an application that is currently designated to be executing on a particular node of the system, the mechanism may attempt to failover the application onto a different node. The mechanism keeps track of a number of nodes on which a failover of the application is attempted. Then, based on one or more factors including the number of nodes on which a failover of the application is attempted, the mechanism may cease to attempt to failover the application onto a node of the system.
摘要:
A method and mechanism for failing over applications in a clustered computing system is provided. In an embodiment, the methodology is implemented by a high-availability failover mechanism. Upon detecting a failure of an application that is currently designated to be executing on a particular node of the system, the mechanism may attempt to failover the application onto a different node. The mechanism keeps track of a number of nodes on which a failover of the application is attempted. Then, based on one or more factors including the number of nodes on which a failover of the application is attempted, the mechanism may cease to attempt to failover the application onto a node of the system.