METHOD, DEVICE AND STORAGE MEDIUM FOR TRAINING A DEEP LEARNING FRAMEWORK

    公开(公告)号:US20220036241A1

    公开(公告)日:2022-02-03

    申请号:US17501003

    申请日:2021-10-14

    Abstract: The present disclosure discloses a method, an apparatus and a storage medium for training a deep learning framework, and relates to the artificial intelligence field such as deep learning and big data processing. The specific implementation solution is: acquiring at least one task node in a current task node cluster, that meets a preset opening condition when a target task meets a training start condition; judging whether a number of nodes of the at least one task node is greater than or equal to a preset number; synchronously training the deep learning framework of the target task by the at least one task node according to sample data if the number of nodes is greater than the preset number; and acquiring a synchronously trained target deep learning framework when the target task meets a training completion condition.

    DISTRIBUTED TRAINING METHOD BASED ON END-TO-END ADAPTION, AND DEVICE

    公开(公告)号:US20230169351A1

    公开(公告)日:2023-06-01

    申请号:US18060705

    申请日:2022-12-01

    CPC classification number: G06N3/098

    Abstract: A distributed training method based on end-to-end adaption, a device and a storage medium. The method includes: obtaining slicing results by slicing a model to be trained; obtaining an attribute of computing resources allocated to the model for training by parsing the computing resources, in which the computing resources are determined based on a computing resource requirement of the model, computing resources occupied by another model being trained, and idle computing resources, and the attribute of the computing resources is configured to represent at least one of a topology relation and a task processing capability of the computing resources; determining a distribution strategy of each of the slicing results in the computing resources based on the attributes of the computing resources; and performing distributed training on the model using the computing resources based on the distribution strategy.

Patent Agency Ranking