Compression method and platform of pre-training language model based on knowledge distillation

Invention Grant

US11341326B2 Compression method and platform of pre-training language model based on knowledge distillation 有权

Please log in to see more content

Patent Title: Compression method and platform of pre-training language model based on knowledge distillation
Application No.: US17483805

Application Date: 2021-09-24
Publication No.: US11341326B2

Publication Date: 2022-05-24
Inventor: Hongsheng Wang , Haijun Shan , Fei Yang
Applicant: ZHEJIANG LAB
Applicant Address: CN Hangzhou
Assignee: ZHEJIANG LAB
Current Assignee: ZHEJIANG LAB
Current Assignee Address: CN Hangzhou
Agency: W&G Law Group
Main IPC: G06F17/00
IPC: G06F17/00 ; G06F40/20 ; G06N20/20 ; G06N5/02 ; G06N7/00

Compression method and platform of pre-training language model based on knowledge distillation

Abstract:

Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.

Public/Granted literature

US20220067274A1 COMPRESSION METHOD AND PLATFORM OF PRE-TRAINING LANGUAGE MODEL BASED ON KNOWLEDGE DISTILLATION Public/Granted day:2022-03-03

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）