Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Yang LUO"

1.

发明申请
METHOD OF PROVIDING MODEL SERVICES 有权

公开(公告)号：US20240419991A1

公开(公告)日：2024-12-19

申请号：US18747725

申请日：2024-06-19

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Zhenfang CHU , Zhengyu QIAN , En SHI , Mingren HU , Zhengxiong YUAN , Jinqi LI , Yue HUANG , Yang LUO , Guobin WANG , Yang QIAN , Kuan WANG

IPC: G06N5/04

Abstract: A method is provided that includes: creating a plurality of first model instances of a first service model to be deployed; allocating an inference service for each of a plurality of first model instances from the plurality of inference services; calling, for each first model instance, a loading interface of the inference service allocated for the first model instance to mount a weight file; determining, in response to a user request for a target service model, a target model instance from a plurality of model instances of the target service model to respond to the user request; and calling a target inference service allocated for the target model instance to use computing resources configured for the target inference service to run, in the target model instance, a base model mounted with a target weight file, and obtain a request result of the user request.

2.

发明公开
INFERENCE SERVICE DEPLOYMENT METHOD, DEVICE, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230376726A1

公开(公告)日：2023-11-23

申请号：US17980204

申请日：2022-11-03

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Zhengxiong YUAN , Zhenfang CHU , Jinqi LI , Mingren HU , Guobin WANG , Yang LUO , Yue HUANG , Zhengyu QIAN , En SHI

IPC: G06N3/04

CPC classification number: G06N3/04

Abstract: Provided are an inference service deployment method, a device and a storage medium, relating to the field of artificial intelligence technology, and in particular to the field of machine learning and inference service technology. The inference service deployment method includes: obtaining performance information of a runtime environment of a deployment end; selecting a target version of an inference service from a plurality of candidate versions of the inference service of a model according to the performance information of the runtime environment of the deployment end; and deploying the target version of the inference service to the deployment end.

Patent Agency Ranking