Patent search ap:("Huawei Technologies Co. Page Ltd.") AND inv:"Hui Zang"

1.

发明申请
DISTRIBUTED SYNCHRONOUS TRAINING ARCHITECTURE USING STALE WEIGHTS 有权

公开(公告)号：US20220027738A1

公开(公告)日：2022-01-27

申请号：US17450055

申请日：2021-10-05

Applicant: Huawei Technologies Co., Ltd.

Inventor： Hui Zang , Xiaolin Cheng

IPC: G06N3/08 , G06N3/04

Abstract: A computer-implemented method for distributed synchronous training of a neural network model includes performing, by a worker machine of a plurality of worker machines, a forward computation of a training data set using a plurality of N layers of the neural network model. The forward computation starts at Layer 1 and proceeds through Layer N of the neural network model. The method further includes performing, by the worker machine, a backward computation of the training data set, the backward computation starting at Layer N and proceeding through Layer 1 of the neural network model. The method further includes synchronizing, by the worker machine, a plurality of gradients outputted by the neural network model during the backward computation. The synchronizing of the plurality of gradients is performed with other worker machines of the plurality of worker machines and in parallel with the backward computation.

2.

发明申请
COMPILER-LEVEL GENERAL MATRIX MULTIPLICATION CONFIGURATION OPTIMIZATION 有权

公开(公告)号：US20210200521A1

公开(公告)日：2021-07-01

申请号：US17182753

申请日：2021-02-23

Applicant: Huawei Technologies Co., Ltd.

Inventor： Hui Zang , Huaqing Zhang , Xiaolin Cheng

IPC: G06F8/41 , G06F17/16

Abstract: A system and method is provided for optimizing general matrix multiplication (GEMM) on target hardware by splitting matrices to be multiplied into tiles and formulating a tiling configuration search problem for matrices to be multiplied that explores a configuration search space to identify an optimal tiling configuration that minimizes running time on the target hardware for multiplication of matrices A (m×k) and B (k×n) on the target hardware for respective configuration states as a function of matrix parameters m, k, and n, and numbers of respective nested loops for each dimension m, k, and n, respectively. The optimal tiling configuration for the target hardware is obtained by implementing a Greedy Best-First-Search (GBFS) algorithm or a Neighborhood Actor Advantage Critic (N-A2C) algorithm that optimizes the running time for multiplication of the matrices on the target hardware, and the target hardware is configured and computations are run accordingly.

3.

发明申请
LEVERAGING LAGGING GRADIENTS IN MACHINE-LEARNING MODEL TRAINING 有权

公开(公告)号：US20210374544A1

公开(公告)日：2021-12-02

申请号：US17445139

申请日：2021-08-16

Applicant: Huawei Technologies Co.,Ltd.

Inventor： Hui Zang , Xiaolin Cheng

IPC: G06N3/08

Abstract: A computer-implemented method for distributed synchronous training of a neural network model includes detecting gradient sets from a plurality of worker machines, each worker machine generating a gradient set in a current iteration of a training data set, and each gradient set of the gradient sets comprising a plurality of gradients. A lagging gradient set from a lagging worker machine is detected. The lagging gradient set is generated by the lagging worker machine in a prior iteration of the training data set. Aggregated gradients are generated based on the gradient sets and the lagging gradient set. The neural network model is updated based on the aggregated gradients.

4.

发明授权
Compiler-level general matrix multiplication configuration optimization 有权

公开(公告)号：US11842178B2

公开(公告)日：2023-12-12

申请号：US17182753

申请日：2021-02-23

Applicant: Huawei Technologies Co., Ltd.

Inventor： Hui Zang , Huaqing Zhang , Xiaolin Cheng

IPC: G06F8/40 , G06F8/41 , G06F7/16

CPC classification number: G06F8/443 , G06F7/16 , G06F8/447

Abstract: A system and method is provided for optimizing general matrix multiplication (GEMM) on target hardware by splitting matrices to be multiplied into tiles and formulating a tiling configuration search problem for matrices to be multiplied that explores a configuration search space to identify an optimal tiling configuration that minimizes running time on the target hardware for multiplication of matrices A (m×k) and B (k×n) on the target hardware for respective configuration states as a function of matrix parameters m, k, and n, and numbers of respective nested loops for each dimension m, k, and n, respectively. The optimal tiling configuration for the target hardware is obtained by implementing a Greedy Best-First-Search (GBFS) algorithm or a Neighborhood Actor Advantage Critic (N-A2C) algorithm that optimizes the running time for multiplication of the matrices on the target hardware, and the target hardware is configured and computations are run accordingly.

Patent Agency Ranking