- 专利标题: LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING
-
申请号: US17664031申请日: 2022-05-18
-
公开(公告)号: US20230153532A1公开(公告)日: 2023-05-18
- 发明人: Pengcheng HE , Jianfeng GAO , Weizhu CHEN
- 申请人: Microsoft Technology Licensing, LLC
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人: Microsoft Technology Licensing, LLC
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F40/284
- IPC分类号: G06F40/284 ; G06F40/295 ; G06N3/08 ; G06N5/04
摘要:
A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.
信息查询