-
公开(公告)号:US20240249115A1
公开(公告)日:2024-07-25
申请号:US18605951
申请日:2024-03-15
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Yunxiao SUN , Yucong ZHOU , Zhao ZHONG
Abstract: An input of an optimized query Query feature transformation module is obtained based on an output feature of at least one previous network layer of the optimized attention layer. An input of an optimized key Key feature transformation module is obtained based on an output feature of at least one previous network layer of the optimized attention layer. An input of an optimized value Value feature transformation module is obtained based on an output feature of at least one previous network layer of the optimized attention layer. An input of at least one feature transformation module in the optimized query Query feature transformation module, the optimized key Key feature transformation module, and the optimized value Value feature transformation module is obtained based on an output feature of at least one non-adjacent previous network layer of the optimized attention layer.
-
公开(公告)号:US20230186103A1
公开(公告)日:2023-06-15
申请号:US18165083
申请日:2023-02-06
Applicant: Huawei Technologies Co., Ltd.
Inventor: Yucong ZHOU , Zhao ZHONG
IPC: G06N3/0985 , G06N3/084
CPC classification number: G06N3/0985 , G06N3/084
Abstract: This application relates to the field of artificial intelligence technologies, and describes a classification model training method, a hyperparameter search method, and an apparatus. The training method includes obtaining a target hyperparameter of a to-be-trained classification model. The target hyperparameter is used to control a gradient update operation of the to-be-trained classification model. The to-be-trained classification model includes a scaling invariance linear layer. The scaling invariance linear layer enables a predicted classification result output when a weight parameter of the to-be-trained classification model is multiplied by any scaling coefficient to remain unchanged. The method further includes updating the weight parameter of the to-be-trained classification model based on the target hyperparameter and a target training manner, to obtain a trained classification model.
-
公开(公告)号:US20240078428A1
公开(公告)日:2024-03-07
申请号:US18354744
申请日:2023-07-19
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Yucong ZHOU , Zezhou ZHU , Zhao ZHONG
Abstract: A neural network model training method, a data processing method, and an apparatus are disclosed. The neural network model training method includes: training a neural network model based on training data, where an activation function of the neural network model includes at least one piecewise function, and the piecewise function includes a plurality of trainable parameters; and updating the plurality of trainable parameters of the at least one piecewise function in a training process. According to the method, the activation function suitable for the neural network model can be obtained. This can improve performance of the neural network model.
-
公开(公告)号:US20230385642A1
公开(公告)日:2023-11-30
申请号:US18446294
申请日:2023-08-08
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Yucong ZHOU , Zhao ZHONG
IPC: G06N3/08 , G06N3/0464
CPC classification number: G06N3/08 , G06N3/0464
Abstract: This application discloses a model training method, which may be applied to the field of artificial intelligence. The method includes: obtaining a first neural network model; replacing a first convolutional layer in the first neural network model with a linear operation to obtain a plurality of second neural network models; and performing model training on a plurality of second neural network models, to obtain a neural network model with a highest model precision in a plurality of trained second neural network models. In this application, a convolutional layer in a to-be-trained neural network is replaced with a linear operation equivalent to a convolutional layer. A manner with highest precision is selected from a plurality of replacement manners, to improve precision of a trained model.
-
-
-