Distributed computing system, and data transmission method and apparatus in distributed computing system

    公开(公告)号:US11010681B2

    公开(公告)日:2021-05-18

    申请号:US16805007

    申请日:2020-02-28

    Abstract: A distributed computing system is provided. Both a first computing node and a second computing node in the distributed computing system store information about a name, a size, and a communication peer side identifier of a first data flow graph parameter in a data flow graph. The first computing node stores the first data flow graph parameter, where the first computing node and the second computing node generate respective triplets based on same interface parameter generation algorithms and information about the first data flow graph parameter that are stored in the respective nodes. The triplet is used as an interface parameter of a message passing interface (MPI) primitive that is used to transmit the first data flow graph parameter between the first computing node and the second computing node.

    Deep Learning Job Scheduling Method and System and Related Device

    公开(公告)号:US20210011762A1

    公开(公告)日:2021-01-14

    申请号:US17038720

    申请日:2020-09-30

    Abstract: A deep learning job scheduling method includes obtaining a job request of a deep learning job, determining a target job description file template from a plurality of pre-stored job description file templates based on the job request, determining an identifier of a target job basic image from identifiers of a plurality of pre-stored job basic images based on the job request, generating a target job description file based on the target job description file template and the identifier of the target job basic image, sending the target job description file to a container scheduler, and selecting the target job basic image from the pre-stored job base images based on the target job description file, and creating at least one container for executing the job request.

Patent Agency Ranking