-
公开(公告)号:US20240062020A1
公开(公告)日:2024-02-22
申请号:US17970305
申请日:2022-10-20
发明人: Pengcheng HE , Jianfeng GAO , Nanshan ZENG , Xuedong HUANG , Wei XIONG , Baolin PENG
IPC分类号: G06F40/56 , G06F40/284 , G06F40/51
CPC分类号: G06F40/56 , G06F40/284 , G06F40/51
摘要: Systems and methods are provided for training and using a novel unified language foundation model. An encoder-decoder natural language model is obtained and various training data is obtained and used for training. The training process integrates a combination of replaced token detection, corrupted span reconstruction, and disentangled attention methodologies to produce a unified encoder-decoder model. The trained model is trained for performing both natural language understanding (NLU) tasks and natural language generation (NLG) tasks. Attention applied to the model is applied discretely to segmented chunks of encoded data during processing to improve the efficiency of applying attention by the model.
-
公开(公告)号:US20230153348A1
公开(公告)日:2023-05-18
申请号:US17526806
申请日:2021-11-15
发明人: Jinchao LI , Lars H. LIDEN , Baolin PENG , Thomas PARK , Swadheen Kumar SHUKLA , Jianfeng GAO
摘要: Systems and methods are provided for determining a response to a query in a dialog. An entity extractor extracts rules and conditions associated with the query and determines a particular task. The disclosed technology generates a transformer-based dialog embedding by pre-training a transformer using dialog corpora including a plurality of tasks. A task-specific classifier generates a first set of candidate responses based on rules and conditions associated with the task. The transformer-based dialog embedding generates a second set of candidate responses to the query. The classifier accommodates changes made to a task by an interactive dialog editor as machine teaching. A response generator generates a response based on the first and second sets of candidate responses using an optimization function. The disclosed technology leverages both a data-driven, generative model (a transformer) based on dialog corpora and a user-driven, task-specific rule-based classifier that accommodating updates in rules and conditions associated with a particular task.
-
公开(公告)号:US20240362418A1
公开(公告)日:2024-10-31
申请号:US18140658
申请日:2023-04-28
发明人: Baolin PENG , Michel GALLEY , Hao CHENG , Pengcheng HE , Nguyen Hung BACH , Weizhu CHEN , Jianfeng GAO
IPC分类号: G06F40/40 , G06F16/332
CPC分类号: G06F40/40 , G06F16/3325
摘要: A technique supplements a language model with knowledge information retrieved from external sources. The technique operates by: receiving a query; receiving knowledge information based on the query; generating original model-input information that includes the query and the knowledge information; and presenting the original model-input information to the language model. The technique further includes: receiving an original response from the language model; generating a usefulness measure that identifies usefulness of the original response; and determining whether the usefulness measure satisfies a prescribed test. Upon determining that the usefulness measure does not satisfy the test, the technique includes: generating revised model-input information that includes feedback information; presenting the revised model-input information to the language model; and receiving a revised response from the language model. According to some implementations, the technique eliminates or reduces artificial hallucination exhibited by the language model.
-
4.
公开(公告)号:US20240062018A1
公开(公告)日:2024-02-22
申请号:US17970174
申请日:2022-10-20
发明人: Pengcheng HE , Jianfeng GAO , Nanshan ZENG , Xuedong HUANG , Wei XIONG , Baolin PENG
IPC分类号: G06F40/40 , G06F40/284 , G06F40/149
CPC分类号: G06F40/40 , G06F40/284 , G06F40/149
摘要: Systems and methods are provided for training and using a novel unified language foundation model. An encoder-decoder natural language model is obtained and various training data is obtained and used for training. The training process integrates a combination of replaced token detection, corrupted span reconstruction, and disentangled attention methodologies to produce a unified encoder-decoder model. The trained model is trained for performing both natural language understanding (NLU) tasks and natural language generation (NLG) tasks. Attention applied to the model is applied discretely to segmented chunks of encoded data during processing to improve the efficiency of applying attention by the model.
-
-
-