METHOD FOR AUTOMATICALLY COMPRESSING MULTITASK-ORIENTED PRE-TRAINED LANGUAGE MODEL AND PLATFORM THEREOF

    公开(公告)号:US20220188658A1

    公开(公告)日:2022-06-16

    申请号:US17564071

    申请日:2021-12-28

    Applicant: ZHEJIANG LAB

    Abstract: Disclosed is a method for automatically compressing multi-task oriented pre-trained language model and a platform thereof. According to the method, a meta-network of a structure generator is designed, a knowledge distillation coding vector is constructed based on a knowledge distillation method of Transformer layer sampling, and a distillation structure model corresponding to a currently input coding vector is generated by using the structure generator; at the same time, a Bernoulli distribution sampling method is provided for training the structure generator; in each iteration, each encoder unit is transferred by Bernoulli distribution sampling to form a corresponding coding vector; by changing the coding vector input to the structure generator and a small batch of training data, the structure generator and the corresponding distillation structure are jointly trained, and a structure generator capable of generating weights for different distillation structures can be acquired.

    METHOD AND PLATFORM FOR META-KNOWLEDGE FINE-TUNING BASED ON DOMAIN-INVARIANT FEATURES

    公开(公告)号:US20220222529A1

    公开(公告)日:2022-07-14

    申请号:US17674859

    申请日:2022-02-18

    Applicant: ZHEJIANG LAB

    Abstract: Disclosed is a method for meta-knowledge fine-tuning and platform based on domain-invariant features. According to the method, highly transferable common knowledge, i.e., domain-invariant features, in different data sets of the same kind of tasks is learnt, the common domain features in different domains corresponding to different data sets of the same kind of tasks learnt in the network set are fine-tuned to be quickly adapted to any different domains. According to the present application, the parameter initialization ability and generalization ability of the universal language model of the same kind of tasks are improved, and finally a common compression framework of the universal language model of the same kind of downstream tasks is obtained through fine tuning. In the meta-knowledge fine-tuning network, a loss function of the domain-invariant features is designed in the present application, and domain-independent universal knowledge is learn.

    COMPRESSION METHOD AND PLATFORM OF PRE-TRAINING LANGUAGE MODEL BASED ON KNOWLEDGE DISTILLATION

    公开(公告)号:US20220067274A1

    公开(公告)日:2022-03-03

    申请号:US17483805

    申请日:2021-09-24

    Applicant: ZHEJIANG LAB

    Abstract: Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.

Patent Agency Ranking