-
1.
公开(公告)号:US20240330648A1
公开(公告)日:2024-10-03
申请号:US18596994
申请日:2024-03-06
Applicant: Gwangju Institute of Science and Technology
Inventor: Hee Jun JUNG , Kang Il KIM , Do Yeon KIM
Abstract: A method for training a student network including at least one or more of a transformer neural network by using knowledge distillation in a teacher network including at least one or more of the transformer neural network is disclosed. The method includes: pre-training the teacher network using a training data and fine tuning the trained teacher network; copying a weight parameter of a bottom layer of the teacher network to the student network; and performing the knowledge distillation to the student network through the fine-tuned teacher network. The performing the knowledge distillation includes: extracting a feature structure from the result value of a layer of the fine-tuned teacher network; extracting a feature structure from the result value of a layer of the student network; and adjusting the feature structure of the extracted student network based on the feature structure of the extracted teacher network.