DYNAMIC COMPUTATION IN DECENTRALIZED DISTRIBUTED DEEP LEARNING TRAINING

    公开(公告)号:US20220012584A1

    公开(公告)日:2022-01-13

    申请号:US16925178

    申请日:2020-07-09

    Abstract: Embodiments of a method are disclosed. The method includes performing decentralized distributed deep learning training on a batch of training data. Additionally, the method includes determining a training time wherein the learner performs the decentralized distributed deep learning training on the batch of training data. Further, the method includes generating a table having the training time and other processing times for corresponding other learners performing the decentralized distributed deep learning training on corresponding other batches of other training data. The method also includes determining that the learner is a straggler based on the table and a threshold for the training time. Additionally, the method includes modifying a processing aspect of the straggler to reduce a future training time of the straggler for performing the decentralized distributed deep learning training on a new batch of training data in response to determining the learner is the straggler.

    DYNAMIC DISTRIBUTED TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:US20220327374A1

    公开(公告)日:2022-10-13

    申请号:US17226399

    申请日:2021-04-09

    Abstract: Computer hardware and/or software that performs the following operations: (i) updating a machine learning model by synchronously applying, to the machine learning model, a first set of training results received from a set of trainers having respective training datasets; (ii) receiving, from one or more trainers of the set of trainers, a first set of metrics pertaining to at least some of the training results of the first set of training results; and (iii) based, at least in part, on the first set of metrics, determining to subsequently update the machine learning model via asynchronous application of subsequent training results received from respective trainers of the set of trainers.

    DYNAMIC COMPUTATION RATES FOR DISTRIBUTED DEEP LEARNING

    公开(公告)号:US20220012629A1

    公开(公告)日:2022-01-13

    申请号:US16925161

    申请日:2020-07-09

    Abstract: Embodiments of a method are disclosed. The method includes performing distributed deep learning training on multiple batches of training data using corresponding learners. Additionally, the method includes determining training times wherein the learners perform the distributed deep learning training on the batches of training data. The method also includes modifying a processing aspect of the straggler to reduce a future training time of the straggler for performing the distributed deep learning training on a new batch of training data in response to identifying a straggler of the learners by a centralized control.

Patent Agency Ranking