-
公开(公告)号:US20230252354A1
公开(公告)日:2023-08-10
申请号:US18179627
申请日:2023-03-07
Inventor: Junyuan SHANG , Shuohuan WANG , Siyu DING , Yanbin ZHAO , Chao PANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG
IPC: G06N20/00 , G06F40/40 , G06F40/279
CPC classification number: G06N20/00 , G06F40/40 , G06F40/279
Abstract: A method for pre-training a language model includes: constructing a pre-training language data set, in which the pre-training language data set comprises unsupervised language data and supervised language data; generating a hierarchical multi-template and multi-task language data set based on the pre-training language data set; and pre-training the language model based on the hierarchical multi-template and multi-task language data set.
-
公开(公告)号:US20230206080A1
公开(公告)日:2023-06-29
申请号:US18118339
申请日:2023-03-07
Inventor: Shuohuan WANG , Weibao GONG , Zhihua WU , Yu SUN , Siyu DING , Yaqian HAN , Yanbin ZHAO , Yuang LIU , Dianhai YU
Abstract: A model training system includes at least one first cluster and a second cluster communicating with the at least first cluster. The at least one first cluster is configured to acquire a sample data set, generate training data according to the sample data set, and send the training data to the second cluster; and the second cluster is configured to train a pre-trained model according to the training data sent by the at least one first cluster.
-
公开(公告)号:US20240412002A1
公开(公告)日:2024-12-12
申请号:US18747641
申请日:2024-06-19
Inventor: Yanbin ZHAO , Siyu DING , Shuohuan WANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG
IPC: G06F40/35
Abstract: A method is provided. The method includes: obtaining a first sample dataset; inputting at least one first question text corresponding to at least one piece of first sample data into a dialog model separately to obtain at least one first answer prediction result; inputting each second question text into the dialog model to obtain a second answer prediction result output by the dialog model; inputting the second answer prediction result into a reward model to obtain a score of the second answer prediction result output by the reward model; determining a comprehensive loss based on the at least one first answer prediction result, a first answer text of each of the at least one piece of first sample data, and a score corresponding to each of at least one piece of second sample data; and adjusting at least one parameter of the dialog model based on the comprehensive loss.
-
公开(公告)号:US20230040095A1
公开(公告)日:2023-02-09
申请号:US17889218
申请日:2022-08-16
Inventor: Junyuan SHANG , Shuohuan WANG , Siyu DING , Yanbin ZHAO , Chao PANG , Yu Sun
IPC: G06F40/40 , G06F40/289
Abstract: A method and apparatus for pre-training a model, a device, a storage medium, and a program product. An embodiment of the method includes: acquiring a sample natural language text; generating N types of prompt words based on the sample natural language text, where N is a positive integer; generating sample input data based on the sample natural language text and the N types of prompt words; and training an initial language model based on the sample input data, to obtain a pre-trained language model.
-
公开(公告)号:US20220293092A1
公开(公告)日:2022-09-15
申请号:US17828773
申请日:2022-05-31
Inventor: Siyu DING , Chao PANG , Shuohuan WANG , Yanbin ZHAO , Junyuan SHANG , Yu SUN , Shikun FENG , Hao TIAN , Hua WU , Haifeng WANG
Abstract: The present application provides a method of training a natural language processing model, which relates to a field of artificial intelligence, and in particular to a field of natural language processing. A specific implementation scheme includes: performing a semantic learning for multi-tasks on an input text, so as to obtain a semantic feature for the multi-tasks, wherein the multi-tasks include a plurality of branch tasks; performing a feature learning for each branch task based on the semantic feature, so as to obtain a first output result for each branch task; calculating a loss for each branch task according to the first output result for the branch task; and adjusting a parameter of the natural language processing model according to the loss for each branch task. The present application further provides a method of processing a natural language, an electronic device, and a storage medium.
-
-
-
-