Patent search ap:("Google LLC") AND inv:"Sanqiang Zhao" Page 1

1.

发明授权
Extreme language model compression with optimal sub-words and shared projections 有权

公开(公告)号：US11797862B2

公开(公告)日：2023-10-24

申请号：US16749570

申请日：2020-01-22

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/088 , G06F40/284 , G06N3/045

CPC classification number: G06N3/088 , G06F40/284 , G06N3/045

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

2.

发明公开
Extreme Language Model Compression with Optimal Sub-Words and Shared Projections 审中-公开

公开(公告)号：US20240013059A1

公开(公告)日：2024-01-11

申请号：US18471866

申请日：2023-09-21

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/0455 , G06F40/40 , G06N3/08

CPC classification number: G06N3/0455 , G06F40/40 , G06N3/08 , G06F40/284

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

3.

发明申请
Extreme Language Model Compression with Optimal Sub-Words and Shared Projections 有权

公开(公告)号：US20210224660A1

公开(公告)日：2021-07-22

申请号：US16749570

申请日：2020-01-22

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/08 , G06N3/04 , G06F40/284

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBAsE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

4.

发明授权
Extreme language model compression with optimal sub-words and shared projections 有权

公开(公告)号：US12260340B2

公开(公告)日：2025-03-25

申请号：US18471866

申请日：2023-09-21

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/088 , G06F40/284 , G06N3/045

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

Patent Agency Ranking