-
公开(公告)号:US20240412096A1
公开(公告)日:2024-12-12
申请号:US18208173
申请日:2023-06-09
Applicant: Microsoft Technology Licensing, LLC
Inventor: Anand PADMANABHA IYER , Ganesh ANANTHANARAYANAN , Yiwen ZHANG
IPC: G06N20/00
Abstract: Optimizing ML pipeline deployment using an ML pipeline management system. A method includes receiving an indication of an input data source and input data type from the input data source. An indication of a plurality filters to be included in the pipeline, an ML model, and predetermined performance criteria is received. The method includes determining a physical topology of the ML pipeline and configuration of the filters or the ML model. The determined physical topology includes placement of the filters and the model, and the configuration. The determined physical topology satisfies the performance criteria. The filters and ML model are placed across an infrastructure, comprising a plurality of tiers, according to the determined physical topology.
-
公开(公告)号:US20230342278A1
公开(公告)日:2023-10-26
申请号:US17725825
申请日:2022-04-21
Applicant: Microsoft Technology Licensing, LLC
Inventor: Anand PADMANABHA IYER , Swapnil Sunilkumar GANDHI
CPC classification number: G06F11/3442 , G06F9/505 , G06N5/043 , G06N20/00
Abstract: The present disclosure relates to methods and systems for providing inferences using machine learning systems. The methods and systems receive a load forecast for processing requests by a machine learning model and split the machine learning model into a plurality machine learning model portions based on the load forecast. The methods and systems determine a batch size for the requests for the machine learning model portions. The methods and systems use one or more available resources to execute the plurality of machine learning model portions to process the requests and generate inferences for the requests.
-
公开(公告)号:US20240370781A1
公开(公告)日:2024-11-07
申请号:US18332589
申请日:2023-06-09
Applicant: Microsoft Technology Licensing, LLC
Inventor: Anand PADMANABHA IYER , Jayashree MOHAN , Ranjita BHAGWAN , Nagarajan NATARAJAN , Venkata N. PADMANABHAN , Rohit MALLIKARJUNA PUSHPA , Divyam ANSHUMAAN
IPC: G06N20/20
Abstract: Computer-assisted configuration of compute resource to perform tasks of a given inference task type. For each of multiple model combinations, the computing system estimates 1) a compute level that can perform tasks of the given inference type using the model combination, and 2) an accuracy of the model combination in performing tasks of the given inference task type. The computing system then selects a model combination for the given inference task type based on the estimated compute level of the model combination and the estimated accuracy of the model combination. In response to the selection, an inference component is configured to respond to task requests of the given inference task type by using the selected model combination. Scheduling using batch size and input size may further improve accuracy and efficiency of the model combination.
-
公开(公告)号:US20220383188A1
公开(公告)日:2022-12-01
申请号:US17471816
申请日:2021-09-10
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ganesh ANANTHANARAYANAN , Anand PADMANABHA IYER , Yuanchao SHU , Nikolaos KARIANAKIS , Arthi Hema PADMANABHAN
Abstract: Systems and methods are provided for merging models for use in an edge server under the multi-access edge computing environment. In particular, a model merger selects a layer of a model based on a level of memory consumption in the edge server and determines sharable layers based on common properties of the selected layer. The model merger generates a merged model by generating a single instantiation of a layer that corresponds to the sharable layers. A model trainer trains the merged model based on training data for the respective models to attain a level of accuracy of data analytics above a predetermined threshold. The disclosed technology further refreshes the merged model upon observing a level of data drift that exceeds a predetermined threshold. The refreshing of the merged model includes detaching and/or splitting consolidated sharable layers of sub-models in the merged model. By merging models, the disclosed technology reduces memory footprints of models used in the edge server, rectifying memory scarcity issues in the edge server.
-
-
-