Distributed Processing System and Distributed Processing Method

    公开(公告)号:US20210357723A1

    公开(公告)日:2021-11-18

    申请号:US17291229

    申请日:2019-10-23

    IPC分类号: G06N3/04 G06N3/08

    摘要: A distributed processing system includes a plurality of lower-order aggregation networks and a higher-order aggregation network. The lower-order aggregation networks include a plurality of distributed processing nodes disposed in a ring form. The distributed processing nodes generate distributed data for each weight of a neural network of an own node. The lower-order aggregation networks aggregate, for each lower-order aggregation network, the distributed data generated by the distributed processing nodes. The higher-order aggregation network generates aggregated data where the aggregation results of the lower-order aggregation networks are further aggregated, and distributes to the lower-order aggregation networks. The lower-order aggregation networks distribute the aggregated data distributed thereto to the distributed processing nodes belonging to the same lower-order aggregation network. The distributed processing nodes update weights of the neural network based on the distributed aggregated data.

    Arithmetic circuit
    5.
    发明授权

    公开(公告)号:US11360741B2

    公开(公告)日:2022-06-14

    申请号:US16959968

    申请日:2018-12-18

    摘要: An arithmetic circuit includes an LUT generation circuit (1) that, when coefficients c[n] (n=1, . . . , N) are paired two by two, outputs a value calculated for each of the pairs, and a distributed arithmetic circuit (2-m) that calculates values y[m] of product-sum arithmetic, by which data x[m, n] of a data set X[m] containing M pairs of data x[m, n] is multiplied by the coefficients c[n] and the products are summed up, in parallel for each of the M pairs. The distributed arithmetic circuit (2-m) includes a plurality of binomial distributed arithmetic circuits that calculate the value of binomial product-sum arithmetic in parallel for each of the pairs, based on a value obtained by pairing N data x[m, n] corresponding to the circuit two by two, a value obtained by pairing the coefficients c[n] two by two, and the value calculated by the LUT generation circuit (1), and a binomial distributed arithmetic result summing circuit that sums up the values calculated by the binomial distributed arithmetic circuits and outputs the sum as y[m].

    Distributed processing system and distributed processing method

    公开(公告)号:US11240296B2

    公开(公告)日:2022-02-01

    申请号:US17287063

    申请日:2019-10-07

    IPC分类号: G06F15/16 H04L29/08 H04L12/24

    摘要: A first distributed processing node transmits distributed data to a second distributed processing node as intermediate consolidated data. A third distributed processing node generates intermediate consolidated data after update from received intermediate consolidated data and distributed data, and transmits the intermediate consolidated data to a fourth distributed processing node. The first distributed processing node transmits the received intermediate consolidated data to fifth distributed processing node as consolidated data. The third distributed processing node transmits the received consolidated data to a sixth distributed processing node. When an aggregation communication time period required by each distributed processing node to consolidate the distributed data or an aggregation dispatch communication time period being a total time period of the aggregation communication time period and a time period required by each distributed processing node to dispatch the consolidated data exceeds a predetermined time period, the first distributed processing node issues a warning.

    Distributed Deep Learning System
    8.
    发明申请

    公开(公告)号:US20210056416A1

    公开(公告)日:2021-02-25

    申请号:US16979066

    申请日:2019-02-25

    IPC分类号: G06N3/08 G06F13/40 G06N3/04

    摘要: Each of learning nodes calculates a gradient of a loss function from an output result obtained when learning data is input to a neural network to be learned, generates a packet for a plurality of gradient components, and transmits the packet to the computing interconnect device. The computing interconnect device acquires the values of a plurality of gradient components stored in the packet transmitted from each of the learning nodes, performs a calculation process in which configuration values of gradients with respect to the same configuration parameter of the neural network are input on each of a plurality of configuration values of each gradient in parallel, generates a packet for the calculation results, and transmits the packet to each of the learning nodes. Each of the learning nodes updates the configuration parameters of the neural network based on the value stored in the packet.

    Scheduling apparatus and method
    9.
    发明授权

    公开(公告)号:US10485008B2

    公开(公告)日:2019-11-19

    申请号:US15756016

    申请日:2016-08-26

    摘要: A convergence pattern selection unit (10A) sequentially generates a plurality of different patterns based on designated initial conditions, selects, as a convergence pattern, a pattern in which evaluation value has converged to an extreme value, and repeatedly executes selection of the convergence pattern by changing the initial conditions every time the convergence pattern is selected. A transmission pattern determination unit (10B) selects, as an optimum transmission pattern, one of the convergence patterns obtained by the convergence pattern selection unit (10A), which has the highest evaluation value. This allows searches for an optimum transmission pattern having a better evaluation value.