MULTI-LEVEL CACHING FOR DYNAMIC DEEP LEARNING MODELS

    公开(公告)号:US20210319369A1

    公开(公告)日:2021-10-14

    申请号:US17358654

    申请日:2021-06-25

    Abstract: Systems, apparatuses and methods provide technology for model generation with intermediate stage caching and re-use, including generating, via a model pipeline, a multi-level set of intermediate stages for a model, caching each of the set of intermediate stages, and responsive to a change in the model pipeline, regenerating an executable for the model using a first one of the cached intermediate stages to bypass regeneration of at least one of the intermediate stages. The multi-level set of intermediate stages can correspond to a hierarchy of processing stages in the model pipeline, where using the first one of the cached intermediate stages results in bypassing regeneration of a corresponding intermediate stage and of all intermediate stages preceding the corresponding intermediate stage in the hierarchy. Further, regenerating an executable for the model can include regenerating one or more intermediate stages following the corresponding intermediate stage in the hierarchy.

    GRAPH PARTITIONING TO EXPLOIT BATCH-LEVEL PARALLELISM

    公开(公告)号:US20210318908A1

    公开(公告)日:2021-10-14

    申请号:US17358751

    申请日:2021-06-25

    Abstract: Systems, apparatuses and methods provide technology for batch-level parallelism, including partitioning a graph into a plurality of clusters comprising batched clusters that support batched data and non-batched clusters that fail to support batched data, establishing an execution queue for execution of the plurality of clusters based on cluster dependencies, and scheduling inference execution of the plurality of clusters in the execution queue based on batch size. The technology can include identifying nodes of the graph as batched or non-batched, generating a batched cluster comprising a plurality of batched nodes based on a relationship between two or more of the batched nodes, and generating a non-batched cluster comprising a plurality of non-batched nodes based on a relationship between two or more of the non-batched nodes. The technology can also include generating a set of cluster dependencies, where the cluster dependencies are used to determine an execution order for the clusters.

    Graph partitioning to exploit batch-level parallelism

    公开(公告)号:US11941437B2

    公开(公告)日:2024-03-26

    申请号:US17358751

    申请日:2021-06-25

    CPC classification number: G06F9/4881 G06F9/5038 G06F16/9024 G06N3/04 G06N3/08

    Abstract: Systems, apparatuses and methods provide technology for batch-level parallelism, including partitioning a graph into a plurality of clusters comprising batched clusters that support batched data and non-batched clusters that fail to support batched data, establishing an execution queue for execution of the plurality of clusters based on cluster dependencies, and scheduling inference execution of the plurality of clusters in the execution queue based on batch size. The technology can include identifying nodes of the graph as batched or non-batched, generating a batched cluster comprising a plurality of batched nodes based on a relationship between two or more of the batched nodes, and generating a non-batched cluster comprising a plurality of non-batched nodes based on a relationship between two or more of the non-batched nodes. The technology can also include generating a set of cluster dependencies, where the cluster dependencies are used to determine an execution order for the clusters.

    Multi-level caching for dynamic deep learning models

    公开(公告)号:US12288141B2

    公开(公告)日:2025-04-29

    申请号:US17358654

    申请日:2021-06-25

    Abstract: Systems, apparatuses and methods provide technology for model generation with intermediate stage caching and re-use, including generating, via a model pipeline, a multi-level set of intermediate stages for a model, caching each of the set of intermediate stages, and responsive to a change in the model pipeline, regenerating an executable for the model using a first one of the cached intermediate stages to bypass regeneration of at least one of the intermediate stages. The multi-level set of intermediate stages can correspond to a hierarchy of processing stages in the model pipeline, where using the first one of the cached intermediate stages results in bypassing regeneration of a corresponding intermediate stage and of all intermediate stages preceding the corresponding intermediate stage in the hierarchy. Further, regenerating an executable for the model can include regenerating one or more intermediate stages following the corresponding intermediate stage in the hierarchy.

Patent Agency Ranking