APPARATUS AND METHOD WITH NEURAL NETWORK TRAINING BASED ON KNOWLEDGE DISTILLATION
Abstract:
A method includes: generating, based on a student network result of an implemented student network provided with an input, a sample corresponding to a distribution of an energy-based model based on the student network result and a teacher network result of an implemented teacher network provided with the input; training model parameters of the energy-based model to decrease a value of the energy-based model, based on the teacher network result and the student network result; and training the implemented student network to increase the value of the energy-based model, based on the sample and the student network result.
Information query
Patent Agency Ranking
0/0