-
公开(公告)号:US20210390405A1
公开(公告)日:2021-12-16
申请号:US17340639
申请日:2021-06-07
Inventor: Young Ri CHOI , Yeon Hyeok JEONG , Seung Min LEE , Seong Hyeon JUE
Abstract: Disclosed is a training system performing training on a plurality of neural network models in parallel. The training system includes a first job proxy that receives a training request for a first neural network model and partitions a first training job corresponding to the first neural network model into first microservices, a second job proxy that receives a training request for a second neural network model and partitions a second training job corresponding to the second neural network model into second microservices, a scheduler that dynamically schedules the first microservices and the second microservices so as to be executed by heterogeneous processing units, a cluster that includes the heterogeneous processing units, sequentially executes the first microservices and sequentially executes the second microservices, and a distributed in-memory database that stores parameters generated in response to the execution of the first microservices and the second micro services.