-
1.
公开(公告)号:US11341326B2
公开(公告)日:2022-05-24
申请号:US17483805
申请日:2021-09-24
Applicant: ZHEJIANG LAB
Inventor: Hongsheng Wang , Haijun Shan , Fei Yang
Abstract: Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.
-
2.
公开(公告)号:US11941532B2
公开(公告)日:2024-03-26
申请号:US17726563
申请日:2022-04-22
Applicant: ZHEJIANG LAB
Inventor: Hongsheng Wang , Wei Hua , Hujun Bao , Fei Yang
Abstract: Disclosed is a method for adapting a deep learning framework to a hardware device based on a unified backend engine, which comprises the following steps: S1, adding the unified backend engine to the deep learning framework; S2, adding the unified backend engine to the hardware device; S3, converting a computational graph, wherein the computational graph compiled and generated by the deep learning framework is converted into an intermediate representation of the unified backend engine; S4, compiling the intermediate representation, wherein the unified backend engine compiles the intermediate representation on the hardware device to generate an executable object; S5, running the executable object, wherein the deep learning framework runs the executable object on the hardware device; S6: managing memory of the unified backend engine.
-