-
1.
公开(公告)号:US20210357256A1
公开(公告)日:2021-11-18
申请号:US16874479
申请日:2020-05-14
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: LIANJIE CAO , FARAZ AHMED , PUNEET SHARMA
Abstract: Systems and methods are provided for optimally allocating resources used to perform multiple tasks/jobs, e.g., machine learning training jobs. The possible resource configurations or candidates that can be used to perform such jobs are generated. A first batch of training jobs can be randomly selected and run using one of the possible resource configuration candidates. Subsequent batches of training jobs may be performed using other resource configuration candidates that have been selected using an optimization process, e.g., Bayesian optimization. Upon reaching a stopping criterion, the resource configuration resulting in a desired optimization metric, e.g., fastest job completion time can be selected and used to execute the remaining training jobs.
-
2.
公开(公告)号:US20220292303A1
公开(公告)日:2022-09-15
申请号:US17199294
申请日:2021-03-11
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: LIANJIE CAO , FARAZ AHMED , PUNEET SHARMA , ALI TARIQ
Abstract: Systems and methods can be configured to determine a plurality of computing resource configurations used to perform machine learning model training jobs. A computing resource configuration can comprise: a first tuple including numbers of worker nodes and parameter server nodes, and a second tuple including resource allocations for the worker nodes and parameter server nodes. At least one machine learning training job can be executed using a first computing resource configuration having a first set of values associated with the first tuple. During the executing the machine learning training job: resource usage of the worker nodes and parameter server nodes caused by a second set of values associated with the second tuple can be monitored, and whether to adjust the second set of values can be determined. Whether a stopping criterion is satisfied can be determined. One of the plurality of computing resource configurations can be selected.
-