-
公开(公告)号:US20240419991A1
公开(公告)日:2024-12-19
申请号:US18747725
申请日:2024-06-19
Inventor: Zhenfang CHU , Zhengyu QIAN , En SHI , Mingren HU , Zhengxiong YUAN , Jinqi LI , Yue HUANG , Yang LUO , Guobin WANG , Yang QIAN , Kuan WANG
IPC: G06N5/04
Abstract: A method is provided that includes: creating a plurality of first model instances of a first service model to be deployed; allocating an inference service for each of a plurality of first model instances from the plurality of inference services; calling, for each first model instance, a loading interface of the inference service allocated for the first model instance to mount a weight file; determining, in response to a user request for a target service model, a target model instance from a plurality of model instances of the target service model to respond to the user request; and calling a target inference service allocated for the target model instance to use computing resources configured for the target inference service to run, in the target model instance, a base model mounted with a target weight file, and obtain a request result of the user request.