FEDERATED DISTRIBUTION OF COMPUTATION AND OPERATIONS USING NETWORKED PROCESSING UNITS

    公开(公告)号:US20230136048A1

    公开(公告)日:2023-05-04

    申请号:US18090686

    申请日:2022-12-29

    IPC分类号: G06F9/48 G06F9/30

    摘要: Various approaches for deploying and controlling distributed compute operations with the use of infrastructure processing units (IPUs) and similar network-addressable processing units are disclosed. A device for orchestrating functions in a network compute mesh is configured to receive, at a network-addressable processing unit of a network-addressable processing unit mesh from a requestor device, a computation request to execute a workflow with a set of objectives; query at least one other network-addressable processing units of the network-addressable processing unit mesh using the set of objectives, to determine aspects of available resources and data in the network-addressable processing unit mesh to apply to the workflow; transmit a list of recommended resources available to execute the workflow to the requestor device, the list of recommended resources being ranked based on at least one dimension of the resources; obtain a compute chain from the requestor device, the compute chain describing resource control transitions and data flow provided from the recommended resources and data in the network-addressable processing unit mesh; and schedule the execution of the workflow at one or more network-addressable processing units in the network-addressable processing unit mesh in accordance with the compute chain.

    STABLE TRANSFORMATIONS OF NETWORKED SYSTEMS WITH AUTOMATION

    公开(公告)号:US20220012149A1

    公开(公告)日:2022-01-13

    申请号:US17484253

    申请日:2021-09-24

    摘要: Various methods, systems, and use cases for a stable and automated transformation of a networked computing system are provided, to enable a transformation to the configuration of the computing system (e.g., software or firmware upgrade, hardware change, etc.). In an example, automated operations include: identifying a transformation to apply to a configuration of the computing system, for a transformation that affects a network service provided by the computing system; identifying operational conditions used to evaluate results of the transformation; attempting to apply the transformation, using a series of stages that have rollback positions when the identified operational conditions are not satisfied; and determining a successful or unsuccessful result of the attempt to apply the transformation. For an unsuccessful result, remediation may be performed to the configuration, with use of one or more rollback positions; for a successful result, a new restore state is established from the completion state.