OPTIMIZING CONCURRENT EXECUTION USING NETWORKED PROCESSING UNITS

    公开(公告)号:US20230136612A1

    公开(公告)日:2023-05-04

    申请号:US18090749

    申请日:2022-12-29

    IPC分类号: G06F9/48 G06F9/54

    摘要: Various approaches for managing distributed compute operations for workload execution of concurrent tasks, including with the use of infrastructure processing units (IPUs) and similar networked processing units, are disclosed. An example method may include: identifying multiple tasks of a computing workload, for a workload that provides processing dependencies among the tasks, and that uses concurrent execution with one or more of the tasks; monitoring an execution time for each of the tasks, relative to an execution time threshold for each of the tasks; identifying the execution time of a particular task as exceeding an execution time threshold for the particular task; determining a remediation based on the particular task and the identified execution time, with the remediation including use of other compute resources in the distributed computing environment for the workload; and applying the remediation to increase speed of execution of the workload.

    MANAGEMENT OF WORKLOAD PROCESSING USING DISTRIBUTED NETWORKED PROCESSING UNITS

    公开(公告)号:US20230135645A1

    公开(公告)日:2023-05-04

    申请号:US18090764

    申请日:2022-12-29

    IPC分类号: G06F9/50

    摘要: Various approaches for deploying and controlling distributed compute operations with the use of infrastructure processing units (IPUs) and similar networked processing units are disclosed. A system that includes a networked processing unit may perform workload processing with operations that: receive, from another networked processing unit, workload information for a workload, for a workload having respective tasks to be processed among distributed computing entities; perform an analysis of network conditions for a predicted execution of the workload, based on the workload information, to analyze network availability among the distributed computing entities; perform an analysis of compute conditions for the predicted execution of the workload, based on the workload information, to analyze processing availability among the distributed computing entities; and identify locations of the distributed computing entities to deploy the workload, based on the analysis of network conditions and the analysis of compute conditions.