Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Weibao GONG"

1.

发明公开
MODEL TRAINING METHOD, SYSTEM, DEVICE, AND MEDIUM 审中-公开

公开(公告)号：US20230206080A1

公开(公告)日：2023-06-29

申请号：US18118339

申请日：2023-03-07

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Weibao GONG , Zhihua WU , Yu SUN , Siyu DING , Yaqian HAN , Yanbin ZHAO , Yuang LIU , Dianhai YU

IPC: G06N3/094 , G06N3/045

CPC classification number: G06N3/094 , G06N3/045

Abstract: A model training system includes at least one first cluster and a second cluster communicating with the at least first cluster. The at least one first cluster is configured to acquire a sample data set, generate training data according to the sample data set, and send the training data to the second cluster; and the second cluster is configured to train a pre-trained model according to the training data sent by the at least one first cluster.

2.

发明申请
MIXTURE-OF-EXPERTS MODEL IMPLEMENTATION METHOD AND SYSTEM, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20250036920A1

公开(公告)日：2025-01-30

申请号：US18026140

申请日：2022-09-20

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Liang SHEN , Haifeng WANG , Huachao WU , Weibao GONG , Zhihua WU , Dianhai YU

IPC: G06N3/045 , G06N3/0495

Abstract: The present disclosure provides a mixture-of-experts (MoE) model implementation method and system, an electronic device, and a storage medium, and relates to the field of artificial intelligence (AI) such as deep learning and distributed storage. The method includes: constructing a communication group, the communication group including a tensor-parallelism communication group, the tensor-parallelism communication group including at least two computing devices, tensor-parallelism segmentation being adopted for sparse parameters of each of the computing devices in a same tensor-parallelism communication group; and training an MoE model based on the communication group. By use of the solutions of the present disclosure, normal operation of model training can be guaranteed.

3.

发明申请
METHOD AND APPARATUS FOR PERFORMING DISTRIBUTED TRAINING ON DEEP LEARNING MODEL, DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220374713A1

公开(公告)日：2022-11-24

申请号：US17880070

申请日：2022-08-03

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Zhihua WU , Dianhai YU , Yulong AO , Weibao GONG

IPC: G06N3/08

Abstract: The present disclosure provides a method and apparatus for performing distributed training on a deep learning model. The method may include: generating a distributed computation view based on data information of a to-be-trained deep learning model; generating a cluster resource view based on property information of a cluster hardware resource corresponding to the to-be-trained deep learning model; determining a target segmentation strategy of a distributed training task based on the distributed computation view and the cluster resource view; and performing distributed training on the to-be-trained deep learning model based on the target segmentation strategy.

Patent Agency Ranking