Extreme language model compression with optimal sub-words and shared projections

Invention Grant

US11797862B2 Extreme language model compression with optimal sub-words and shared projections 有权

Please log in to see more content

Patent Title: Extreme language model compression with optimal sub-words and shared projections
Application No.: US16749570

Application Date: 2020-01-22
Publication No.: US11797862B2

Publication Date: 2023-10-24
Inventor: Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: GOOGLE LLC
Current Assignee: GOOGLE LLC
Current Assignee Address: US CA Mountain View
Agency: Dority & Manning, P.A.
Main IPC: G06N3/088
IPC: G06N3/088 ; G06F40/284 ; G06N3/045

Extreme language model compression with optimal sub-words and shared projections

Abstract:

Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

Public/Granted literature

US20210224660A1 Extreme Language Model Compression with Optimal Sub-Words and Shared Projections Public/Granted day:2021-07-22

Information query

Espacenet