摘要:
A method, apparatus, and program product manage scheduling of a plurality of jobs in a parallel computing system of the type that includes a plurality of computing nodes and is disposed in a data center. The plurality of jobs are scheduled for execution on a group of computing nodes from the plurality of computing nodes based on the physical locations of the plurality of computing nodes in the data center. The group of computing nodes is further selected so as to distribute at least one of a heat load and an energy load within the data center. The plurality of jobs may be additionally scheduled based upon an estimated processing requirement for each job of the plurality of jobs.
摘要:
A method and apparatus performs peer-to-peer file transfers on a High Performance Computing (HPC) cluster such as a Beowulf cluster. A peer-to-peer file tracker (PPFT) allows operating system, application and data files to be moved from a pre-loaded node to another node of the HPC cluster. A peer-to-peer (PTP) client is loaded into the nodes to facilitate PTP file transfers to reduce loading on networks, network switches and file servers to reduce the time needed to load the nodes with these files to increase overall efficiency of the multi-node computing system. The selection of the nodes participating in file transfers can be based on network topology, network utilization, job status and predicted network/computer utilization. This selection can be dynamic, changing during the file transfers as resource conditions change. The policies used to choose resources can be configured by an administrator.
摘要:
A method for optimizing efficiency and power consumption in a hybrid computer system is disclosed. The hybrid computer system may comprise one or more front-end nodes connected to a multi-node computer system. Portions of an application may be offloaded from the front-end nodes to the multi-node computer system. By building historical profiles of the applications running on the multi-node computer system, the system can analyze the trade offs between power consumption and performance. For example, if running the application on the multi-node computer system cuts the run time by 5% but increases power consumption by 20% it may be more advantageous to simply run the entire application on the front-end.