摘要:
Disclosed are systems and methods of performing a power cap processing in a compute environment. The method includes determining of one of committed resources and dedicated resources in a compute environment exceed a threshold value for a job. If a determination is yes that the threshold value is exceeded, then the method includes preempting processing of the job in the compute environment by performing one of migrating the job to a new compute resources and performing a power reduction action associated with the job, such as slowing down a processor associated with a job or cancelling the job. When such a power state reduction action is taken, reservations associated with other jobs may also be adjusted.
摘要:
The present invention provides a system, method and computer-readable media for generating virtual private clusters out of a group of compute resources. Typically, the group of compute resources involves a group of clusters independently administered. The method provides for aggregating the group of compute resources, partitioning the aggregated group of compute resources and presenting to each user in an organization a partition representation the organization's virtual private cluster. The users transparently view their cluster and have control over its operation. The partitions may be static or dynamic.
摘要:
The invention comprises systems, methods and computer-readable media for providing multiple-resource management of a cluster environment. The method embodiment of the invention is illustrative and comprises, at a cluster scheduler, defining a resource management interface, identifying a location of a plurality of services within the cluster environment, determining a set of services available from each of the plurality of resource managers, selecting a group of services available from the plurality of resource managers, contacting the group of services to obtain full information associated with the computer environment and integrating the obtained full information into a single cohesive world-view of compute resources and workload requests.
摘要:
The invention relates to systems, methods and computer-readable media for using system jobs for performing actions outside the constraints of batch compute jobs submitted to a compute environment such as a cluster or a grid. The method for modifying a compute environment from a system job comprises associating a system job to a queuable object, triggering the system job based on an event and performing arbitrary actions on resources outside of compute nodes in the compute environment. The queuable objects include objects such as batch compute jobs or job reservations. The events that trigger the system job may be time driven, such as ten minutes prior to completion of the batch compute job, or dependent on other actions associated with other system jobs. The system jobs may be utilized also to perform rolling maintenance on a node by node basis.
摘要:
The invention relates to systems, methods and computer-readable media for controlling access to compute resources in a compute environment such as a cluster or a grid. The method of providing conditional access to a compute environment comprises associating a required service level threshold with a compute environment, associating a service level with a requestor, receiving a request for access to the compute environment from the requestor; and, if the service level of the requestor meets the specified service level threshold, then allowing access to the compute resources. The threshold-based access may be enforced by reservations, policies or some other method.
摘要:
An on-demand compute environment comprises a plurality of nodes within an on-demand compute environment available for provisioning and a slave management module operating on a dedicated node within the on-demand compute environment, wherein upon instructions from a master management module at a local compute environment, the slave management module modifies at least one node of the plurality of nodes.
摘要:
A system, method and computer-readable media for managing a compute environment are disclosed. The method includes importing identity information from an identity manager into a module performs workload management and scheduling for a compute environment and, unless a conflict exists, modifying the behavior of the workload management and scheduling module to incorporate the imported identity information such that access to and use of the compute environment occurs according to the imported identity information. The compute environment may be a cluster or a grid wherein multiple compute environments communicate with multiple identity managers.
摘要:
A system and method for reserving resources within a compute environment such as a cluster or grid are disclosed. The method aspect of the disclosure includes receiving a request for resource availability in a compute environment from a requestor, associating a transaction identification with the request and resources within the compute environment that can meet the request and presenting the transaction identification to the requestor. The transaction ID can also be associated with a time frame in which resources are available and can also be associated with modifications to the resources and supersets of resources that could be drawn upon to meet the request. The transaction ID can also be associated with metrics that identify how well the resource fit with the request and modifications that can make the resources better match the workload which would be submitted under the request.
摘要:
The present invention provides a system, method and computer-readable media for generating virtual private clusters out of a group of compute resources. Typically, the group of compute resources involves a group of clusters independently administered. The method provides for aggregating the group of compute resources, partitioning the aggregated group of compute resources and presenting to each user in an organization a partition representation the organization's virtual private cluster. The users transparently view their cluster and have control over its operation. The partitions may be static or dynamic.
摘要:
Disclosed are a system and method of integrating an on-demand compute environment into a local compute environment. The method includes receiving a request from an administrator to integrate an on-demand compute environment into a local compute environment and, in response to the request, automatically integrating local compute environment information with on-demand compute environment information to make available resources from the on-demand compute environment to requestors of resources in the local compute environment.