Abstract:
A system and method for forecasting power consumption at a facility, the facility having a system of compute units for executing jobs of computing. The forecast of power includes forecasting sequence of jobs execution on a system of the nodes over time, estimating power for the jobs of the system, and developing a system-level power forecast.
Abstract:
A system with improved power performance for task executed in parallel. A plurality of processing cores each to execute tasks. An inter-core messaging unit to conveys messages between the cores. A power management agent transitions a first core into a lower power state responsive to the first core waiting for a second core to complete a second task. In some embodiments long messages are subdivided to allow a receiving core to resume useful work sooner.
Abstract:
A system with granular power/performance management. The system includes a plurality of platforms each to execute tasks, each platform having a plurality of settings that affect a ratio of performance to power usage. Each platform executes an optimization agent to collectively cause the platforms to execute a workload based on a plurality of permutations of the settings. A candidate list creator exists as part of the optimization agent to aggregate a list of performance metrics associated with the plurality of permutations.
Abstract:
A method and apparatus for node power regulation among nodes that share a power supply are described. In one embodiment, the apparatus comprises a power supply unit to provide input power and a plurality of nodes coupled to receive the input power, where each node of the plurality of nodes is operable to run power management logic, and wherein two or more nodes of the plurality of nodes alternate between performing power management and providing power regulation control information to other nodes of the plurality of nodes to regulate power consumption by the plurality of nodes, with, at any one time, only one node of plurality of nodes generating the power regulation control to regulate power for the plurality of nodes.
Abstract:
A non-transitory computer readable storage medium having stored thereon instructions executable by one or more processors to perform operations including: receiving a plurality of input parameters including (i) a workload type, (ii) a list of selected nodes belonging to a distributed computer system, and (iii) a list of frequencies; responsive to receiving the plurality of workload parameters, retrieving calibration data from a calibration database; generating a power estimate based on the plurality of workload parameters and the calibration data; and providing the power estimate to a resource manager is shown. Alternatively, the input parameters may include (i) a workload type, (ii) a list of selected nodes belonging to a distributed computer system, and (iii) an amount of available power, wherein the estimator may provide an estimation of the frequency at which the nodes should operate to utilize as much of the available power without exceeding the available power.
Abstract:
A method of assessing energy efficiency of a High-performance computing (HPC) system, including: selecting a plurality of HPC workloads to run on a system under test (SUT) with one or more power constraints, wherein the SUT includes a plurality of HPC nodes in the HPC system, executing the plurality of HPC workloads on the SUT, and generating a benchmark metric for the SUT based on a baseline configuration for each selected HPC workload and a plurality of measured performance per power values for each executed workload at each selected power constraint is shown.
Abstract:
A non-transitory computer readable storage medium storing instructions executable by one or more processors of a distributed computer system to perform operations including determining whether a power consumed by the distributed computer system is greater than a power allocated to the distributed computer system, responsive to determining the power consumed by the distributed computer system is greater than the power allocated to the distributed computer system, determining whether all jobs being processed by the distributed computer system are processing at a lowest power state for each job, wherein a job includes one or more calculations performed by the one or more processors of the distributed computer system and responsive to determining all jobs being processed by the distributed computer system are processing at a lowest power state for each job, suspending a job having a lowest priority among all jobs being processed by the distributed computer system is shown.
Abstract:
A zombie server can be detected. Detecting a zombie server can include receiving, at a server, network traffic and calculating a percentage of the network traffic as being productivity software layer 7 protocols every first time interval. Detecting a zombie server can also include marking the server as a zombie server based on the percentage every second time interval and processing the network traffic at the server to perform a number of actions by the productivity software.
Abstract:
A method and apparatus for coordinating and authenticating requests for data. In one embodiment, the apparatus comprises: a baseboard management controller (BMC); and a request coordinator coupled to the BMC to intercept BMC requests and to provide intercepted requests to the BMC, where the coordination interface comprises a request parser to parse parameters for each of the BMC requests, one or more queues to store the requests while the BMC is servicing another BMC request, and a command submitter to send individual BMC requests to the BMC, wherein the BMC is operable to generate the responses to the BMC requests received from the coordination interface and to send the responses to the coordination interface.
Abstract translation:一种用于协调和认证数据请求的方法和设备。 在一个实施例中,该装置包括:基板管理控制器(BMC); 以及请求协调器,耦合到BMC以拦截BMC请求并向BMC提供拦截请求,其中协调接口包括请求解析器以解析每个BMC请求的参数,一个或多个队列存储请求,而BMC 正在服务另一个BMC请求,以及命令提交者向BMC发送各个BMC请求,其中BMC可操作以生成对从协调接口接收到的BMC请求的响应,并将响应发送到协调接口。 p >
Abstract:
A system and method for computing at a facility having systems of multiple compute nodes to execute jobs of computing. Power consumption of the facility is managed to within a power band. The power consumption may be adjusted by implementing (e.g., by a power balloon) activities having little or no computational output.