Multi-level caching for dynamic deep learning models

    公开(公告)号:US12288141B2

    公开(公告)日:2025-04-29

    申请号:US17358654

    申请日:2021-06-25

    Abstract: Systems, apparatuses and methods provide technology for model generation with intermediate stage caching and re-use, including generating, via a model pipeline, a multi-level set of intermediate stages for a model, caching each of the set of intermediate stages, and responsive to a change in the model pipeline, regenerating an executable for the model using a first one of the cached intermediate stages to bypass regeneration of at least one of the intermediate stages. The multi-level set of intermediate stages can correspond to a hierarchy of processing stages in the model pipeline, where using the first one of the cached intermediate stages results in bypassing regeneration of a corresponding intermediate stage and of all intermediate stages preceding the corresponding intermediate stage in the hierarchy. Further, regenerating an executable for the model can include regenerating one or more intermediate stages following the corresponding intermediate stage in the hierarchy.

    COMPUTE-BASED SUBGRAPH PARTITIONING OF DEEP LEARNING MODELS FOR FRAMEWORK INTEGRATION

    公开(公告)号:US20210319298A1

    公开(公告)日:2021-10-14

    申请号:US17357340

    申请日:2021-06-24

    Abstract: Systems, apparatuses and methods provide technology for efficient subgraph partitioning, including generating a first set of subgraphs based on supported nodes of a model graph, wherein the supported nodes have operators that are supported by a hardware backend device, evaluating a compute efficiency of each subgraph of the first set of subgraphs with respect to the hardware backend device and to a default CPU associated with a default runtime, and selecting, from the first set of subgraphs, a second set of subgraphs to be run on the hardware backend device based on the evaluated compute efficiency. The technology can include calculating a backend performance factor for each subgraph for the hardware backend device, calculating a default performance factor for each subgraph for the default CPU, and comparing, for each respective subgraph of the of the first set of subgraphs, the backend performance factor and the default performance factor.

    Platform health engine in infrastructure processing unit

    公开(公告)号:US12182616B2

    公开(公告)日:2024-12-31

    申请号:US17484099

    申请日:2021-09-24

    Abstract: A platform health engine for autonomous self-healing in platforms served by an Infrastructure Processing Unit (IPU), including: an analysis processor configured to apply analytics to telemetry data received from a telemetry agent of a monitored platform managed by the IPU, and to generate relevant platform health data; a prediction processor configured to predict, based on the relevant platform health data, a future health status of the monitored platform; and a dispatch processor configured to dispatch a workload of the monitored platform to another platform managed if the predicted future health status of the monitored platform is failure.

    GRAPH PARTITIONING TO EXPLOIT BATCH-LEVEL PARALLELISM

    公开(公告)号:US20210318908A1

    公开(公告)日:2021-10-14

    申请号:US17358751

    申请日:2021-06-25

    Abstract: Systems, apparatuses and methods provide technology for batch-level parallelism, including partitioning a graph into a plurality of clusters comprising batched clusters that support batched data and non-batched clusters that fail to support batched data, establishing an execution queue for execution of the plurality of clusters based on cluster dependencies, and scheduling inference execution of the plurality of clusters in the execution queue based on batch size. The technology can include identifying nodes of the graph as batched or non-batched, generating a batched cluster comprising a plurality of batched nodes based on a relationship between two or more of the batched nodes, and generating a non-batched cluster comprising a plurality of non-batched nodes based on a relationship between two or more of the non-batched nodes. The technology can also include generating a set of cluster dependencies, where the cluster dependencies are used to determine an execution order for the clusters.

    Graph partitioning to exploit batch-level parallelism

    公开(公告)号:US11941437B2

    公开(公告)日:2024-03-26

    申请号:US17358751

    申请日:2021-06-25

    CPC classification number: G06F9/4881 G06F9/5038 G06F16/9024 G06N3/04 G06N3/08

    Abstract: Systems, apparatuses and methods provide technology for batch-level parallelism, including partitioning a graph into a plurality of clusters comprising batched clusters that support batched data and non-batched clusters that fail to support batched data, establishing an execution queue for execution of the plurality of clusters based on cluster dependencies, and scheduling inference execution of the plurality of clusters in the execution queue based on batch size. The technology can include identifying nodes of the graph as batched or non-batched, generating a batched cluster comprising a plurality of batched nodes based on a relationship between two or more of the batched nodes, and generating a non-batched cluster comprising a plurality of non-batched nodes based on a relationship between two or more of the non-batched nodes. The technology can also include generating a set of cluster dependencies, where the cluster dependencies are used to determine an execution order for the clusters.

Patent Agency Ranking