-
公开(公告)号:US20240419991A1
公开(公告)日:2024-12-19
申请号:US18747725
申请日:2024-06-19
Inventor: Zhenfang CHU , Zhengyu QIAN , En SHI , Mingren HU , Zhengxiong YUAN , Jinqi LI , Yue HUANG , Yang LUO , Guobin WANG , Yang QIAN , Kuan WANG
IPC: G06N5/04
Abstract: A method is provided that includes: creating a plurality of first model instances of a first service model to be deployed; allocating an inference service for each of a plurality of first model instances from the plurality of inference services; calling, for each first model instance, a loading interface of the inference service allocated for the first model instance to mount a weight file; determining, in response to a user request for a target service model, a target model instance from a plurality of model instances of the target service model to respond to the user request; and calling a target inference service allocated for the target model instance to use computing resources configured for the target inference service to run, in the target model instance, a base model mounted with a target weight file, and obtain a request result of the user request.
-
公开(公告)号:US20230376726A1
公开(公告)日:2023-11-23
申请号:US17980204
申请日:2022-11-03
Inventor: Zhengxiong YUAN , Zhenfang CHU , Jinqi LI , Mingren HU , Guobin WANG , Yang LUO , Yue HUANG , Zhengyu QIAN , En SHI
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: Provided are an inference service deployment method, a device and a storage medium, relating to the field of artificial intelligence technology, and in particular to the field of machine learning and inference service technology. The inference service deployment method includes: obtaining performance information of a runtime environment of a deployment end; selecting a target version of an inference service from a plurality of candidate versions of the inference service of a model according to the performance information of the runtime environment of the deployment end; and deploying the target version of the inference service to the deployment end.
-