METHOD OF PROVIDING MODEL SERVICES

    公开(公告)号:US20240419991A1

    公开(公告)日:2024-12-19

    申请号:US18747725

    申请日:2024-06-19

    Abstract: A method is provided that includes: creating a plurality of first model instances of a first service model to be deployed; allocating an inference service for each of a plurality of first model instances from the plurality of inference services; calling, for each first model instance, a loading interface of the inference service allocated for the first model instance to mount a weight file; determining, in response to a user request for a target service model, a target model instance from a plurality of model instances of the target service model to respond to the user request; and calling a target inference service allocated for the target model instance to use computing resources configured for the target inference service to run, in the target model instance, a base model mounted with a target weight file, and obtain a request result of the user request.

Patent Agency Ranking