CACHING IN A MACHINE LEARNING MODEL HOSTING SERVICE

    公开(公告)号:US20250110784A1

    公开(公告)日:2025-04-03

    申请号:US18478185

    申请日:2023-09-29

    Abstract: Techniques for caching in a machine learning model (ML) hosting service are described. ML model usage data is aggregated from host usage data provided from each host of a first set of hosts, the ML model usage data including, for a particular ML model, a number of inference requests to the particular ML model. A priority order of hosts in a second set of hosts to service an inference request for the particular ML model is calculated. Based on the ML model usage data and the priority order, a set of ML models to load to a particular host in the second set of hosts is determined. The particular host is caused to load the set of ML models. A router is updated to direct ML model inference requests amongst the second set of hosts.

Patent Agency Ranking