JOB AGENT-BASED POWER-CAPPING OF SYSTEMS

    公开(公告)号:US20220291734A1

    公开(公告)日:2022-09-15

    申请号:US17195864

    申请日:2021-03-09

    Abstract: A technique includes an agent executing on a plurality of nodes while a job is being concurrently executed by the plurality of nodes. The plurality of nodes is power-capped by an existing node power consumption budget. The technique includes managing power consumption of the plurality of nodes. The managing includes the agent determining a performance footprint that is associated with execution of the job; and the managing includes the agent determining a second node power consumption budget based on the performance footprint. The second node power consumption budget is different than the existing node power consumption budget. The managing includes the agent providing a power consumption request to a global power dispatcher to set a new node power consumption budget for the plurality of nodes.

    Job agent-based power-capping of systems

    公开(公告)号:US12001256B2

    公开(公告)日:2024-06-04

    申请号:US17195864

    申请日:2021-03-09

    Abstract: A technique includes an agent executing on a plurality of nodes while a job is being concurrently executed by the plurality of nodes. The plurality of nodes is power-capped by an existing node power consumption budget. The technique includes managing power consumption of the plurality of nodes. The managing includes the agent determining a performance footprint that is associated with execution of the job; and the managing includes the agent determining a second node power consumption budget based on the performance footprint. The second node power consumption budget is different than the existing node power consumption budget. The managing includes the agent providing a power consumption request to a global power dispatcher to set a new node power consumption budget for the plurality of nodes.

    OPTIMIZING OPERATION OF HIGH-PERFORMANCE COMPUTING SYSTEMS

    公开(公告)号:US20240095081A1

    公开(公告)日:2024-03-21

    申请号:US17948159

    申请日:2022-09-19

    Abstract: A method for optimizing operations of high-performance computing (HPC) systems includes collecting data associated with a plurality of workload performance profiling counters associated with a workload during runtime of the workload in an HPC system. Based on the collected data, the method includes using a machine-learning technique to classify the workload by determining a workload-specific fingerprint for the workload. The method includes identifying an optimization metric to optimize during running of the workload in the HPC system. The method includes determining an optimal setting for a plurality of tunable hardware execution parameters as measured against the optimization metric by varying at least a portion of the plurality of tunable hardware execution parameters. The method includes storing the workload-specific fingerprint, the optimization metric, and the optimal setting for the plurality of tunable hardware execution parameters as measured against the optimization metric in an architecture-specific knowledge database.

Patent Agency Ranking