Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Shuohuan WANG"

1.

发明申请
TRAINING METHOD FOR A DEEP LEARNING MODEL 有权

公开(公告)号：US20250061305A1

公开(公告)日：2025-02-20

申请号：US18936686

申请日：2024-11-04

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Junyuan SHANG , Yinqi YANG , Guoxia WANG , Linhao ZHANG , Yu SUN , Hua WU , Haifeng WANG

IPC: G06N3/043 , G06N3/045 , G06N3/0985

Abstract: A training method, an inference method, a device, an apparatus, and a medium for a deep learning model are provided. A first model includes a plurality of first parameters, a second model comprises a plurality of second parameters, which is initialized to parameter values of a plurality of target parameters selected from the plurality of first parameters. The training method includes: determining a target loss for both the first model and the second model; adjusting parameter values, including: in response to determining that the target loss indicates that the parameter values of at least part of the target parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding second parameters; and in response to determining that the target loss indicates that the parameter values of at least part of the second parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding target parameters.

2.

发明申请
MODEL TRAINING METHOD, MODEL REASONING METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20250094802A1

公开(公告)日：2025-03-20

申请号：US18965684

申请日：2024-12-02

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Junyuan SHANG , Guoxia WANG , Yinqi YANG , Shuohuan WANG , Yu SUN

IPC: G06N3/08 , G06F40/284

Abstract: Provided is a model training method, a model reasoning method, an electronic device, and a storage medium, relating to the field of data processing, and especially to the technical fields of artificial intelligence, big data, deep learning and large models. The model training method includes: folding an initial token sequence for training a model based on a folding feature value for folding a token sequence to obtain at least a first token sequence subjected to the folding, wherein the initial token sequence represents a token sequence composed of T1 tokens, and the first token sequence has a sequence length less than that of the initial token sequence; and inputting at least the first token sequence into a preset model to train the preset model so as to obtain a target model.

3.

发明公开
METHOD FOR PRE-TRAINING LANGUAGE MODEL 审中-公开

公开(公告)号：US20230252354A1

公开(公告)日：2023-08-10

申请号：US18179627

申请日：2023-03-07

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Junyuan SHANG , Shuohuan WANG , Siyu DING , Yanbin ZHAO , Chao PANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG

IPC: G06N20/00 , G06F40/40 , G06F40/279

CPC classification number: G06N20/00 , G06F40/40 , G06F40/279

Abstract: A method for pre-training a language model includes: constructing a pre-training language data set, in which the pre-training language data set comprises unsupervised language data and supervised language data; generating a hierarchical multi-template and multi-task language data set based on the pre-training language data set; and pre-training the language model based on the hierarchical multi-template and multi-task language data set.

4.

发明申请
MULTIMODAL DATA GENERATION 有权

公开(公告)号：US20250094713A1

公开(公告)日：2025-03-20

申请号：US18967529

申请日：2024-12-03

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Yekun CHAI , Siyu DING , Junyuan SHANG , Zhenyu ZHANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG

IPC: G06F40/284 , G06F16/3329

Abstract: A multimodal data generation method is provided. The method includes: inputting a query data sequence into a multimodal model, to obtain a plurality of tokens in a response data sequence, where a current token is generated through the following operations: inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model generates the current token based on the query data sequence and the current response data sequence, in response to determining that the current token belongs to a first data modality; or inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model denoises an initial token sequence based on the query data sequence and the current response data sequence, to generate a result token sequence, in response to determining that the current token belongs to a second data modality.

5.

发明申请
METHOD FOR GENERATING CROSS-LINGUAL TEXTUAL SEMANTIC MODEL, AND ELECTRONIC DEVICE 有权

公开(公告)号：US20230080904A1

公开(公告)日：2023-03-16

申请号：US18054608

申请日：2022-11-11

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Yaqian HAN , Shuohuan WANG , Yu SUN

IPC: G06F40/30 , G06N5/02

Abstract: A method for generating a cross-lingual textual semantic model includes: acquiring a set of training data that includes pieces of monolingual non-parallel text and pieces of bilingual parallel text; determining a semantic vector of each piece of text in the set of training data by inputting each piece of text into an initial textual semantic model; determining a distance between semantic vectors of each two pieces of text in the set of training data based on the semantic vector of each piece of text in the set of training data; determining a gradient modification based on a parallel relationship between each two pieces of text in the set of training data and the distance between the semantic vectors of each two pieces of text in the set of training data; and acquiring a modified textual semantic model by modifying the initial textual semantic model based on the gradient modification.

6.

发明申请
PRE-TRAINING METHOD OF NEURAL NETWORK MODEL, ELECTRONIC DEVICE AND MEDIUM 有权

公开(公告)号：US20220129753A1

公开(公告)日：2022-04-28

申请号：US17572921

申请日：2022-01-11

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Yuxiang LU , Jiaxiang LIU , Xuyi CHEN , Shikun FENG , Shuohuan WANG , Yu SUN , Shiwei HUANG , Jingzhou HE

IPC: G06N3/08 , G06N3/04

Abstract: A pre-training method of a neural network model, an electronic device, and a medium. The pre-training data is inputted to the initial neural network model, and the initial neural network model is pre-trained in the first training mode, in the first training mode, the plurality of hidden layers share one hidden layer parameter, and the loss value of the initial neural network model is obtained, if the loss value of the initial neural network model is less than a preset threshold, the initial neural network model continues to be pre-trained in the second training mode, in the second training mode, each of the plurality of hidden layers has its own hidden layer parameter.

7.

发明申请
TASK EXECUTION METHOD AND APPARATUS FOR LARGE MODEL, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20250094534A1

公开(公告)日：2025-03-20

申请号：US18968798

申请日：2024-12-04

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Linhao ZHANG , Yilong CHEN , Junyuan SHANG , Yinqi YANG , Shuohuan WANG , Yu SUN

IPC: G06F17/16

Abstract: A task execution method for a large model relates to fields of artificial intelligence, deep learning and large model technologies, and includes executing attention tasks in a task group to be fused using a target computing unit to obtain attention features, where the attention task corresponds to a weighted matrix to be fused, the weighted matrix to be fused is obtained by weighting a matrix to be fused using a weight; obtaining a processing result according to the attention features; determining a loss information according to the processing result; and weighting and fusing matrices to be fused using the target computing unit according to weights for the task group to be fused if the loss information converges, to obtain a fusion matrix for a target task group, where a target task in the target task group is executed by the target computing unit according to the fusion matrix.

8.

发明公开
MODEL TRAINING METHOD, SYSTEM, DEVICE, AND MEDIUM 审中-公开

公开(公告)号：US20230206080A1

公开(公告)日：2023-06-29

申请号：US18118339

申请日：2023-03-07

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Weibao GONG , Zhihua WU , Yu SUN , Siyu DING , Yaqian HAN , Yanbin ZHAO , Yuang LIU , Dianhai YU

IPC: G06N3/094 , G06N3/045

CPC classification number: G06N3/094 , G06N3/045

Abstract: A model training system includes at least one first cluster and a second cluster communicating with the at least first cluster. The at least one first cluster is configured to acquire a sample data set, generate training data according to the sample data set, and send the training data to the second cluster; and the second cluster is configured to train a pre-trained model according to the training data sent by the at least one first cluster.

9.

发明申请
METHOD OF TRAINING FEATURE DETERMINATION MODEL, METHOD OF PERFORMING SEMANTIC ANALYSIS, AND ELECTRONIC DEVICE 有权

公开(公告)号：US20220327290A1

公开(公告)日：2022-10-13

申请号：US17852413

申请日：2022-06-29

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Junyuan SHANG , Shuohuan WANG , Siyu DING

IPC: G06F40/30 , G06N3/08 , G06N3/04

Abstract: There is provided a method of training a feature determination model, which relates to a field of deep learning and natural language processing. The method is implemented to include: determining, by a plurality of feature determination layers arranged in stages, a feature vector for each segment in a pre-training text; and pre-training the feature determination model according to the feature vector. A current stage feature vector is determined by a feature determination layer of a current stage according to a preceding segment feature vector determined for a preceding segment, and a preceding stage feature vector determined by a feature determination layer of a preceding stage. A method of training a feature determination model for a target task, a method of performing semantic analysis for a target task, an electronic device, and a computer storage medium are also provided.

10.

发明申请
METHOD AND APPARATUS OF GENERATING SEMANTIC FEATURE, METHOD AND APPARATUS OF TRAINING MODEL, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20210312139A1

公开(公告)日：2021-10-07

申请号：US17353884

申请日：2021-06-22

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Shuohuan WANG , Siyu DING , Junyuan SHANG , Yu SUN

IPC: G06F40/30 , G06N20/00 , G06N5/04

Abstract: A method and apparatus of generating a semantic feature, a method and apparatus of training a model, an electronic device, and a storage medium are provided. The method of generating the semantic feature includes: segmenting a target document to obtain a segment sequence of the target document; generating a semantic feature of each document segment in the segment sequence of the target document by using a pre-trained bidirectional semantic encoding model; and acquiring the semantic feature of the target document based on the semantic feature of the each document segment in the segment sequence of the target document. The present disclosure further provides a method of training a bidirectional semantic encoding model.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification