Invention Grant
- Patent Title: Compression method and platform of pre-training language model based on knowledge distillation
-
Application No.: US17483805Application Date: 2021-09-24
-
Publication No.: US11341326B2Publication Date: 2022-05-24
- Inventor: Hongsheng Wang , Haijun Shan , Fei Yang
- Applicant: ZHEJIANG LAB
- Applicant Address: CN Hangzhou
- Assignee: ZHEJIANG LAB
- Current Assignee: ZHEJIANG LAB
- Current Assignee Address: CN Hangzhou
- Agency: W&G Law Group
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06F40/20 ; G06N20/20 ; G06N5/02 ; G06N7/00

Abstract:
Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.
Public/Granted literature
- US20220067274A1 COMPRESSION METHOD AND PLATFORM OF PRE-TRAINING LANGUAGE MODEL BASED ON KNOWLEDGE DISTILLATION Public/Granted day:2022-03-03
Information query