NGraph-based GPU backend distributed training method and system

    公开(公告)号:US12001960B2

    公开(公告)日:2024-06-04

    申请号:US18034566

    申请日:2021-07-29

    CPC classification number: G06N3/098 G06F9/546 G06F13/00

    Abstract: An nGraph-based graphics processing unit (GPU) backend distributed training method and system, a computer-readable storage medium, and an electronic device. The method includes: receiving a training request, and obtaining corresponding training data; obtaining a Nvidia® Collective multi-GPU Communication Library (NCCL) file by means of a system path of the NCCL file linked to an nGraph framework; invoking an NCCL communication interface configuration according to the training request to obtain a training model, the NCCL communication interface is an NCCL file-based communication operation interface located at a GPU backend of the nGraph framework; and performing GPU backend training on the training data using the training model. The present application can satisfy an urgent need of a user for performing neural network distributed training on the basis of an nGraph GPU backend, thus further improving the performance of deep learning network training.

    Many-to-many PCIE switch
    37.
    发明授权

    公开(公告)号:US11960429B2

    公开(公告)日:2024-04-16

    申请号:US18082485

    申请日:2022-12-15

    CPC classification number: G06F13/4022

    Abstract: Methods, apparatus, and computer platforms and architectures employing many-to-many and many-to-one peripheral switches. The methods and apparatus may be implemented on computer platforms having multiple nodes, such as those employing a Non-uniform Memory Access (NUMA) architecture, wherein each node comprises a plurality of components including a processor having at least one level of memory cache and being operatively coupled to system memory and operatively coupled to a many-to-many peripheral switch that includes a plurality of downstream ports to which NICs and/or peripheral expansion slots are operatively coupled, or a many-to-one switch that enables a peripheral device to be shared by multiple nodes. During operation, packets are received at the NICs and DMA memory writes are initiated using memory write transactions identifying a destination memory address. The many-to-many and many-to-one peripheral switches forwards the transaction packets internally within the switch based on the destination address such that the packets are forwarded to a node via which the memory address can be accessed. The platform architectures may also be configured to support migration operations in response to failure or replacement of a node.

    Memory system having high data transfer efficiency and host controller

    公开(公告)号:USRE49875E1

    公开(公告)日:2024-03-19

    申请号:US17396421

    申请日:2021-08-06

    Inventor: Akihisa Fujimoto

    CPC classification number: G06F3/00 G06F12/1081 G06F13/28 G06F2213/28

    Abstract: According to one embodiment, the host controller includes a register set to issue command, and a direct memory access (DMA) unit and accesses a system memory and a device. First, second, third and fourth descriptors are stored in the system memory. The first descriptor includes a set of a plurality of pointers indicating a plurality of second descriptors. Each of the second descriptors comprises the third descriptor and fourth descriptor. The third descriptor includes a command number, etc. The fourth descriptor includes information indicating addresses and sizes of a plurality of data arranged in the system memory. The DMA unit sets, in the register set, the contents of the third descriptor forming the second descriptor, from the head of the first descriptor as a start point, and transfers data between the system memory and the host controller in accordance with the contents of the fourth descriptor.

Patent Agency Ranking