摘要:
Management of a virtual machine is enhanced by establishing an initial availability policy for the machine. Once the virtual machine is invoked, the real environment for the virtual machine is monitored for the occurrence of predetermined events. If a real environment event is detected that could affect the availability of the virtual machine, the availability policy of the virtual machine is automatically adjusted to reflect the new or predicted state of the real environment.
摘要:
Management of a virtual machine is enhanced by establishing an initial availability policy for the machine. Once the virtual machine is invoked, the real environment for the virtual machine is monitored for the occurrence of predetermined events. If a real environment event is detected that could affect the availability of the virtual machine, the availability policy of the virtual machine is automatically adjusted to reflect the new or predicted state of the real environment.
摘要:
Systems and methods for domain management in a virtualized computing environment are provided. In one embodiment, the method comprises collating advice received from one or more domain advisors connected in the virtualized computing environment; resolving any conflicts among the advice received from said one or more domain advisors; utilizing the collated advice to generate a placement plan comprising a plurality of operations for virtual machines in said virtualized computing environment; and executing the one or more operations in the placement plan, wherein one or more domain handlers may be called to update the virtualized computing environment before, during or after execution of one or more operations from among said plurality of operations in the plan.
摘要:
Systems and methods for domain management in a virtualized computing environment are provided. In one embodiment, the method comprises collating advice received from one or more domain advisors connected in the virtualized computing environment; resolving any conflicts among the advice received from said one or more domain advisors; utilizing the collated advice to generate a placement plan comprising a plurality of operations for virtual machines in said virtualized computing environment; and executing the one or more operations in the placement plan, wherein one or more domain handlers may be called to update the virtualized computing environment before, during or after execution of one or more operations from among said plurality of operations in the plan.
摘要:
Thermal diagnostic systems and methods are provided for improved detection of airflow anomalies in a computer system. In particular, processor load is selectively increased to amplify the effects caused by any airflow anomaly that may be present in the computer system. Workload migration may be used to shift processor load from another node to a target node. Artificial load may also be generated on the target node. The processor load increased to a level sufficient that an airflow anomaly would cause a detectable temperature difference at the selected node. The processor load may be increased by an amount computed to generate this detectable temperature difference. Alternatively, the processor load may be increased by a predetermined amount of between 40% and 100% of full processor utilization. While at the increased processor load, actual temperature sensed by temperature sensors may be compared to temperatures predicted from the model to detect the presence or absence of an airflow anomaly.
摘要:
A system and method of detecting recirculation within a rack server system. A heat transfer model is constructed for a rack server system. A recirculation zone is specified, and hypothetical recirculation temperatures are input at the recirculation zone. The heat transfer model predicts temperatures elsewhere in the rack severe system, and a predicted temperature profile is computed. Actual temperatures in the rack server system are sensed, and an actual temperature profile is also generated. The actual temperature profile is compared with the predicted temperature profile to detect potential recirculation.
摘要:
A method (and system) of reducing a time for a computer system to recover from a degradation of performance in a hardware or a software in at least one first node of the computer system, includes monitoring a state of the at least one first node, and based on the monitoring, transferring a state of the at least one first node to a second node prior to the degradation in performance of the hardware or the software of the at least one first node.
摘要:
A method and system of resource allocation for execution of a job are provided. The method includes receiving feedback (134) regarding the execution of previously submitted jobs on one or more resource nodes (101-104), and estimating the resources required for execution of a submitted job based on the feedback (134) and the parameters of the job. One, or a plurality of resource nodes in parallel, having the estimated resources are allocated the job. The feedback may be implicit feedback indicating the success or failure of the execution of a job. The one or more resource nodes (101-104) allocated for execution of a job may have less than a user requested resource allocation for the job.
摘要:
Provided are a method, system, and article of manufacture for recovery of application faults in a mirrored application environment. Application events are recorded at a primary system executing an instruction for an application. The recorded events are transferred to a buffer. The recorded events are transferred from the buffer to a secondary system, wherein the secondary system implements processes indicated in the recorded events to execute the instructions indicated in the events. An error is detected at the primary system. A determination is made of a primary order in which the events are executed by processes in the primary system. A determination is made of a modified order of the execution of the events comprising a different order of executing the events than the primary order in response to detecting the error. The secondary system processes execute the instructions indicated in the recorded events according to the modified order.
摘要:
A method and system for ordering and aggregating log streams. Log streams for events from different sources are received. If different sources have different recording cycles, or time epochs, that lead to different temporal granularities, then all of the log streams are combined into a single time epoch that is equal to the longest time epoch. Log streams from sources having shorter time epochs continue to retain information about their original time epochs, in order to retain information about the order of the events in those log streams. The log streams are re-ordered, both before and after being integrated into the aggregate log, by acquiring additional data from the different sources, thus permitting the likely cause/effect relationship between events.