摘要:
A method for a self-testing clusterware agent is provided. A clusterware agent that includes clusterware-side components and application-side components is configured to interface between a cluster manager and an application. The application-side components are invoked by clusterware-side components via an application programming interface, or API that includes API functions that are invocable by a cluster manager. Without any cluster manager invoking the clusterware agent, one or more of the API functions are invoked.
摘要:
A clusterware manager configures a resource according to resource attributes values specified by a resource profile. The resource profile conforms to a resource profile syntax that the clusterware manager is configured to interpret pursuant to clusterware manager software. The resource profile syntax prescribes a start dependency syntax defining a dependency between a first resource and a second resource in which the second resource must be in an online state before the first resource is started. The resource profile syntax further prescribes a stop dependency syntax defining a dependency between a first resource and a second resource in which the first resource is brought in an off-line state when the second resource leaves an online state.
摘要:
A method and computer-readable storage representing resources in a cluster by a plurality of attribute-value pairs that together are part of a “resource profile,” in which each attribute-value pair defines all, or a portion of, a management policy that applies to the resource. A clusterware manager configures a resource according to the resource profile, and follows a resource profile syntax that specifies a runtime value for the resource in which an actual value is substituted at runtime.
摘要:
Techniques are provided for managing a resource in a High Availability (HA) system. The techniques involve incrementing a count when a particular type of remedial action is performed on a resource, so that the count that reflects how often the particular type of remedial action has been performed for the resource. When it is determined that the resource has been in stable operation, the count is automatically reduced. After a failure, the count is used to determine whether to attempt to perform the particular type of remedial action on the resource. Examples of remedial actions include restarting the resource, and relocating the resource to another node of a cluster. By using the count, the system insures that a faulty resource does not get constantly “bounced”. By reducing the count when a resource has become stable, there is less likelihood that failure of otherwise stable resources will require manual intervention.
摘要:
Techniques are provided for managing a resource in a High Availability (HA) system. The techniques involve incrementing a count when a particular type of remedial action is performed on a resource, so that the count that reflects how often the particular type of remedial action has been performed for the resource. When it is determined that the resource has been in stable operation, the count is automatically reduced. After a failure, the count is used to determine whether to attempt to perform the particular type of remedial action on the resource. Examples of remedial actions include restarting the resource, and relocating the resource to another node of a cluster. By using the count, the system insures that a faulty resource does not get constantly “bounced”. By reducing the count when a resource has become stable, there is less likelihood that failure of otherwise stable resources will require manual intervention.