-
公开(公告)号:US20250061305A1
公开(公告)日:2025-02-20
申请号:US18936686
申请日:2024-11-04
Inventor: Shuohuan WANG , Junyuan SHANG , Yinqi YANG , Guoxia WANG , Linhao ZHANG , Yu SUN , Hua WU , Haifeng WANG
IPC: G06N3/043 , G06N3/045 , G06N3/0985
Abstract: A training method, an inference method, a device, an apparatus, and a medium for a deep learning model are provided. A first model includes a plurality of first parameters, a second model comprises a plurality of second parameters, which is initialized to parameter values of a plurality of target parameters selected from the plurality of first parameters. The training method includes: determining a target loss for both the first model and the second model; adjusting parameter values, including: in response to determining that the target loss indicates that the parameter values of at least part of the target parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding second parameters; and in response to determining that the target loss indicates that the parameter values of at least part of the second parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding target parameters.
-
公开(公告)号:US20220198153A1
公开(公告)日:2022-06-23
申请号:US17694034
申请日:2022-03-14
Inventor: Jian GONG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG , Qiaoqiao SHE
IPC: G06F40/40 , G06F40/284 , G06F40/205
Abstract: A model training method, a model training platform, an electronic device and a storage medium are provided, which can be used in the field of artificial intelligence, particularly the fields of natural language processing and deep learning. The model training method includes: receiving an input; determining, based on the input, a user-oriented prefabricated function; determining, based on the input, a model training function; determining, based on the input, a pre-trained model; determining, based on the input, a network structure associated with the pre-trained model so as to support use of the pre-trained model; training, based on the input, the model by using the prefabricated function, the model training function, and the pre-trained model; and providing an output associated with a trained model.
-
公开(公告)号:US20250094802A1
公开(公告)日:2025-03-20
申请号:US18965684
申请日:2024-12-02
Inventor: Junyuan SHANG , Guoxia WANG , Yinqi YANG , Shuohuan WANG , Yu SUN
IPC: G06N3/08 , G06F40/284
Abstract: Provided is a model training method, a model reasoning method, an electronic device, and a storage medium, relating to the field of data processing, and especially to the technical fields of artificial intelligence, big data, deep learning and large models. The model training method includes: folding an initial token sequence for training a model based on a folding feature value for folding a token sequence to obtain at least a first token sequence subjected to the folding, wherein the initial token sequence represents a token sequence composed of T1 tokens, and the first token sequence has a sequence length less than that of the initial token sequence; and inputting at least the first token sequence into a preset model to train the preset model so as to obtain a target model.
-
公开(公告)号:US20230252354A1
公开(公告)日:2023-08-10
申请号:US18179627
申请日:2023-03-07
Inventor: Junyuan SHANG , Shuohuan WANG , Siyu DING , Yanbin ZHAO , Chao PANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG
IPC: G06N20/00 , G06F40/40 , G06F40/279
CPC classification number: G06N20/00 , G06F40/40 , G06F40/279
Abstract: A method for pre-training a language model includes: constructing a pre-training language data set, in which the pre-training language data set comprises unsupervised language data and supervised language data; generating a hierarchical multi-template and multi-task language data set based on the pre-training language data set; and pre-training the language model based on the hierarchical multi-template and multi-task language data set.
-
公开(公告)号:US20230147798A1
公开(公告)日:2023-05-11
申请号:US18052143
申请日:2022-11-02
Inventor: Haifeng WANG , Hao TIAN , Jing LIU , Hua WU , Tian WU , Yu SUN , Qiaoqiao SHE
CPC classification number: G06F16/3347 , G06F40/30
Abstract: A method is provided. The method includes converting a search request of a user into a first request semantic vector. The method further includes searching a search resource database for at least one first data semantic vector matched with the first request semantic vector, wherein the search resource database is constructed as a semantic vector space in which different types of data are converted into corresponding data semantic vectors, and the different types of data include at least texts, pictures and videos. The method further includes generating, based on the at least one first data semantic vector, a search result.
-
公开(公告)号:US20220129768A1
公开(公告)日:2022-04-28
申请号:US17646851
申请日:2022-01-03
Inventor: Dongling XIAO , Yukun LI , Han ZHANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG
IPC: G06N5/02
Abstract: The present disclosure provides a method and apparatus for training a model. The method can include: acquiring at least one paragraph text, each paragraph text comprising a plurality of fine-grained samples; processing a fine-grained sample in the each paragraph text to obtain a coarse-grained sample; annotating the coarse-grained sample in the each paragraph text and obscuring one coarse-grained sample using a mask of one fine-grained sample to obtain a training sample set, wherein the training sample set comprises a plurality of annotated texts, and each annotated text comprises at least one of a fine-grained sample or an annotated coarse-grained sample; and training a fine-grained model using the training sample set to obtain a trained fine-grained model, the fine-grained model being used to learn content of a previous fine grain size and predict content of an adjacent coarse grain size.
-
公开(公告)号:US20240412002A1
公开(公告)日:2024-12-12
申请号:US18747641
申请日:2024-06-19
Inventor: Yanbin ZHAO , Siyu DING , Shuohuan WANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG
IPC: G06F40/35
Abstract: A method is provided. The method includes: obtaining a first sample dataset; inputting at least one first question text corresponding to at least one piece of first sample data into a dialog model separately to obtain at least one first answer prediction result; inputting each second question text into the dialog model to obtain a second answer prediction result output by the dialog model; inputting the second answer prediction result into a reward model to obtain a score of the second answer prediction result output by the reward model; determining a comprehensive loss based on the at least one first answer prediction result, a first answer text of each of the at least one piece of first sample data, and a score corresponding to each of at least one piece of second sample data; and adjusting at least one parameter of the dialog model based on the comprehensive loss.
-
公开(公告)号:US20220391426A1
公开(公告)日:2022-12-08
申请号:US17820285
申请日:2022-08-17
Inventor: Xinwei FENG , Meng TIAN , Feifei LI , Hongjian SHI , Wenbin JIANG , Xueqian WU , Chenyang GUO , Yu WANG , Yu SUN , Shuaiyu CHEN
IPC: G06F16/332 , G06F16/2455 , G06F40/30
Abstract: The present disclosure provides a multi-system-based intelligent question answering method and apparatus, and a device, relating to the field of artificial intelligence, in particular to the field of knowledge graph. The specific implementation solution is: determining a question category of question information in response to a question answering instruction of a user, wherein the question answering instruction is used to indicate the question information; determining a query engine corresponding to the question category, and invoking multiple question analysis systems corresponding to the query engine according to the query engine; and feeding back answer information to the user when the answer information corresponding to the question information is determined according to a current question analysis system in a process of processing the question information by sequentially using the multiple question analysis systems according to system priorities of the question analysis systems.
-
公开(公告)号:US20220293092A1
公开(公告)日:2022-09-15
申请号:US17828773
申请日:2022-05-31
Inventor: Siyu DING , Chao PANG , Shuohuan WANG , Yanbin ZHAO , Junyuan SHANG , Yu SUN , Shikun FENG , Hao TIAN , Hua WU , Haifeng WANG
Abstract: The present application provides a method of training a natural language processing model, which relates to a field of artificial intelligence, and in particular to a field of natural language processing. A specific implementation scheme includes: performing a semantic learning for multi-tasks on an input text, so as to obtain a semantic feature for the multi-tasks, wherein the multi-tasks include a plurality of branch tasks; performing a feature learning for each branch task based on the semantic feature, so as to obtain a first output result for each branch task; calculating a loss for each branch task according to the first output result for the branch task; and adjusting a parameter of the natural language processing model according to the loss for each branch task. The present application further provides a method of processing a natural language, an electronic device, and a storage medium.
-
公开(公告)号:US20250094713A1
公开(公告)日:2025-03-20
申请号:US18967529
申请日:2024-12-03
Inventor: Shuohuan WANG , Yekun CHAI , Siyu DING , Junyuan SHANG , Zhenyu ZHANG , Yu SUN , Hao TIAN , Hua WU , Haifeng WANG
IPC: G06F40/284 , G06F16/3329
Abstract: A multimodal data generation method is provided. The method includes: inputting a query data sequence into a multimodal model, to obtain a plurality of tokens in a response data sequence, where a current token is generated through the following operations: inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model generates the current token based on the query data sequence and the current response data sequence, in response to determining that the current token belongs to a first data modality; or inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model denoises an initial token sequence based on the query data sequence and the current response data sequence, to generate a result token sequence, in response to determining that the current token belongs to a second data modality.
-
-
-
-
-
-
-
-
-