Extreme language model compression with optimal sub-words and shared projections

    公开(公告)号:US11797862B2

    公开(公告)日:2023-10-24

    申请号:US16749570

    申请日:2020-01-22

    Applicant: Google LLC

    CPC classification number: G06N3/088 G06F40/284 G06N3/045

    Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

    UNSUPERVISED PATTERN DISCOVERY USING DYNAMIC GRAPH EMBEDDINGS

    公开(公告)号:US20230334295A1

    公开(公告)日:2023-10-19

    申请号:US18133755

    申请日:2023-04-12

    CPC classification number: G06N3/0455 G06N3/088 G06F18/2321

    Abstract: Discussed herein are devices, systems, and methods for unsupervised pattern discovery using continuous-time dynamic graphs. A method can include receiving, from a graph neural network (GNN), source node embeddings and destination node embeddings, clustering the destination node embeddings generated by the GNN resulting in first groups of destination node embeddings, removing, from the destination node embeddings, embeddings from a noise group of the first groups resulting in signal destination node embeddings, clustering the signal destination node embeddings resulting in second groups of destination node embeddings, and identifying a pattern in the destination node embeddings and source node embeddings based on the second groups of destination node embeddings, the source node embeddings, and the destination node embeddings.

Patent Agency Ranking