Method for automatically compressing multitask-oriented pre-trained language model and platform thereof

    公开(公告)号:US11526774B2

    公开(公告)日:2022-12-13

    申请号:US17564071

    申请日:2021-12-28

    Applicant: ZHEJIANG LAB

    Abstract: Disclosed is a method for automatically compressing multi-task oriented pre-trained language model and a platform thereof. According to the method, a meta-network of a structure generator is designed, a knowledge distillation coding vector is constructed based on a knowledge distillation method of Transformer layer sampling, and a distillation structure model corresponding to a currently input coding vector is generated by using the structure generator; at the same time, a Bernoulli distribution sampling method is provided for training the structure generator; in each iteration, each encoder unit is transferred by Bernoulli distribution sampling to form a corresponding coding vector; by changing the coding vector input to the structure generator and a small batch of training data, the structure generator and the corresponding distillation structure are jointly trained, and a structure generator capable of generating weights for different distillation structures can be acquired.

Patent Agency Ranking