-
公开(公告)号:EP4462313A1
公开(公告)日:2024-11-13
申请号:EP23746144.7
申请日:2023-01-17
发明人: ZHOU, Pingyi , REN, Xiaozhe , WANG, Yasheng , HE, Bin , MENG, Xinfan , JIANG, Xin
IPC分类号: G06N3/08
摘要: This application relates to the artificial intelligence field, and discloses a data processing method. The method includes: processing target data through a target neural network to obtain a data processing result, where a target header of the target neural network is used to process, through a first transformation matrix, a first vector corresponding to first subdata, and process, through a second transformation matrix, a second vector corresponding to the first subdata, where the first vector corresponds to position information of the first subdata in the target data, and the second vector corresponds to semantic information of the first subdata. In this application, a matrix size of a transformation matrix corresponding to a position vector is set to be less than a size of a matrix corresponding to a semantic vector. To be specific, a size of the first transformation matrix is less than a size of the second transformation matrix. This can reduce a size of a transformation matrix used for calculating a correlation between position information, to reduce computing resource overheads of a model during inference or training.
-
公开(公告)号:EP4361843A1
公开(公告)日:2024-05-01
申请号:EP22841352.2
申请日:2022-07-12
发明人: XU, Hang , REN, Xiaozhe , YIN, Yichun , QIAN, Li , LI, Zhenguo , JIANG, Xin , GAO, Jiahui
IPC分类号: G06F16/332 , G06N3/04
摘要: This application relates to the artificial intelligence field, and discloses a neural network search method and a related apparatus. The neural network search method includes: constructing attention heads (heads) in transformer layers by sampling a plurality of candidate operators during model search, to construct a plurality of candidate neural networks, and comparing performance of the plurality of candidate neural networks to select a target neural network with higher performance. In this application, a transformer model is constructed with reference to model search, so that a new attention structure with better performance than an original self-attention mechanism can be generated, and effect in a wide range of downstream tasks is significantly improved.
-
公开(公告)号:EP4390753A1
公开(公告)日:2024-06-26
申请号:EP22869111.9
申请日:2022-09-08
发明人: MENG, Xiaojun , WANG, Yasheng , JIANG, Xin , LIU, Qun
IPC分类号: G06F40/30 , G06F40/284 , G06N3/04 , G06N3/08
摘要: A text data processing method, a neural-network training method, and related devices are provided. The methods may be applied to the text data processing field in the artificial intelligence field. The method includes: obtaining a to-be-processed text, where the to-be-processed text includes a plurality of characters; and processing the to-be-processed text by using a target model to obtain a prediction result, where the prediction result indicates to split the to-be-processed text into a plurality of target character sets, the prediction result further includes a plurality of first labels, one first label indicates semantics of one target character set, and the plurality of first labels are used to determine an intention of the to-be-processed text. A reduplicated word or a modal auxiliary word can be split into another target character set, so that the intention of the to-be-processed text can still be understood even if the reduplicated word or the modal auxiliary word, or the like exists in the entire to-be-processed text. In this way, a natural language understanding method with a stronger generalization capability is provided.
-
公开(公告)号:EP4336378A1
公开(公告)日:2024-03-13
申请号:EP22815120.5
申请日:2022-05-25
发明人: HOU, Lu , SHANG, Lifeng , JIANG, Xin , QIAN, Li
IPC分类号: G06F16/33 , G06F16/332 , G06N3/04 , G06N3/08 , G06N5/04
摘要: This application relates to the field of artificial intelligence, and discloses a data processing method. The method includes: obtaining a transformer model including a target network layer and a target module; and processing to-be-processed data by using the transformer model, to obtain a data processing result. The target module is configured to: perform a target operation on a feature map output at the target network layer, to obtain an operation result, and fuse the operation result and the feature map output, to obtain an updated feature map output. In this application, the target module is inserted into the transformer model, and the operation result generated by the target module and an input are fused, so that information carried in a feature map output by the target network layer of the transformer model is increased. In addition, data processing accuracy of the model is improved while a quantity of parameters of the target module and computing power overheads required during an operation are small, that is, the quantity of parameters of the model and the computing power overheads are reduced.
-
公开(公告)号:EP4024232A1
公开(公告)日:2022-07-06
申请号:EP20862561.6
申请日:2020-07-17
发明人: YIN, Yichun , SHANG, Lifeng , JIANG, Xin , CHEN, Xiao
IPC分类号: G06F16/35
摘要: A text processing model training method, and a text method and apparatus in the natural language processing field in the artificial intelligence field are disclosed. The training method includes: obtaining training text (510); separately inputting the training text into a teacher model and a student model to obtain sample data output by the teacher model and prediction data output by the student model, where the teacher model and the student model each include an input layer, one or more intermediate layers, and an output layer; the sample data includes a sample semantic feature output by the intermediate layer of the teacher model and a sample label output by the output layer of the teacher model; the prediction data includes a prediction semantic feature output by the intermediate layer of the student model and a prediction label output by the output layer of the student model; and the teacher model is a pre-trained language model used for text classification (520); and training a model parameter of the student model based on the sample data and the prediction data, to obtain a target student model (530). The method enables the student model to effectively perform knowledge transfer, thereby improving accuracy of a text processing result of the student model.
-
公开(公告)号:EP4318322A1
公开(公告)日:2024-02-07
申请号:EP22790962.9
申请日:2022-04-15
发明人: WEI, Junqiu , LIAO, Yi , JIANG, Xin , LIU, Qun , QIAN, Li
IPC分类号: G06N3/08
摘要: A data processing method includes: obtaining a first embedding vector for indicating a known data unit and a position of the known data unit and a second embedding vector for indicating a position of a to-be-predicted data unit; processing the first embedding vector by using a target encoder, to obtain an output vector; and processing the output vector and the second embedding vector by using a target prediction network, to obtain a to-be-predicted data unit. According to the method, M pieces of additional position information do not need to be separately set as input of the target encoder, and a quantity of latent variables of intermediate output of the target encoder is also consistent with a quantity of input embedding vectors, thereby reducing a computation amount and memory consumption of the target encoder.
-
公开(公告)号:EP4206957A1
公开(公告)日:2023-07-05
申请号:EP21874284.9
申请日:2021-09-18
发明人: REN, Xiaozhe , YIN, Yichun , JIANG, Xin
IPC分类号: G06F17/16
摘要: This application relates to the field of artificial intelligence, and provides a model training method. The method includes: obtaining a to-be-trained first neural network model, where the first neural network model includes a first operator, and the first operator is used to perform a product operation on input data and a target weight matrix (301); replacing the first operator in the first neural network model with a second operator, to obtain a second neural network model, where the second operator is used to perform a product operation on input data and a plurality of sub-weight matrices, and the plurality of sub-weight matrices are obtained by performing matrix factorization on the target weight matrix (302); and performing model training on the second neural network model to obtain a target neural network model (303). In this method, the target weight matrix is split into a product of the plurality of sub-weight matrices. Therefore, a training device requires a shorter time period to perform the product operation on the input data and the plurality of sub-weight matrices, thereby reducing a model training time period.
-
公开(公告)号:EP4181020A1
公开(公告)日:2023-05-17
申请号:EP21849491.2
申请日:2021-07-13
发明人: LIAO, Yi , JIANG, Xin , CHEN, Xiao , QIAN, Li , LIU, Qun
摘要: A model training method applied to the field of artificial intelligence is disclosed. The method includes: sending a first submodel to a first device (501), where the first submodel is obtained by compressing a to-be-trained model; receiving a first gradient sent by the first device (502), where the first gradient is obtained when the first device trains the first submodel; and performing model training on the to-be-trained model based on at least the first gradient, to obtain an updated to-be-trained model (503). In the method, a server compresses the to-be-trained model and delivers the to-be-trained model to a terminal device, so that the terminal device does not need to train a large model with a same scale as that of the server.
-
公开(公告)号:EP4379603A1
公开(公告)日:2024-06-05
申请号:EP22857922.3
申请日:2022-08-19
发明人: HOU, Lu , BAI, Haoli , SHANG, Lifeng , JIANG, Xin , QIAN, Li
摘要: This application relates to the field of artificial intelligence, and discloses a model distillation method, including: distilling a student model at a first computing node in a computing node cluster by using a partial model of the student model and a partial model of a teacher model, and performing a gradient back propagation process of distillation inside the first computing node, without dependency on another computing node to complete distillation of a network layer for which the first computing node is responsible, to achieve higher utilization of computing resources, and accelerate a distillation process.
-
公开(公告)号:EP4209965A1
公开(公告)日:2023-07-12
申请号:EP21874288.0
申请日:2021-09-18
发明人: LI, Zichao , HOU, Lu , JIANG, Xin
摘要: A data processing method includes: obtaining to-be-processed data and a target neural network model, where the target neural network model includes a first transformer layer, the first transformer layer includes a first residual branch and a second residual branch, the first residual branch includes a first attention head, and the second residual branch includes a target feedforward layer FFN (601); and performing target task related processing on the to-be-processed data based on the target neural network model, to obtain a data processing result, where the target neural network model is for performing a target operation on an output of the first attention head and a first weight value to obtain an output of the first residual branch, and/or the target neural network model is for performing a target operation on an output of the target FFN and a second weight value to obtain an output of the second residual branch (603). In the method, weight values for controlling an output of a residual branch are set for different tasks, so that a computing resource requirement of a terminal device for running the target neural network model is lowered.
-
-
-
-
-
-
-
-
-