Soft label generation for knowledge distillation
摘要:
A technique for generating soft labels for training is disclosed. A teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Class pairs includes classes labelled to corresponding data units from the teacher side class set and a student side class set different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.
公开/授权文献
信息查询
0/0