-
公开(公告)号:US20220392434A1
公开(公告)日:2022-12-08
申请号:US17342490
申请日:2021-06-08
发明人: Abedelkader ASI , Yarin KUPER , Royi RONEN , Song WANG , Olga GOLDENBERG , Shimrit Rada BEMIS , Erez ALTUS , Yi MAO , Weizhu CHEN
摘要: The disclosure herein describes reducing training bias in outputs generated by a generative language model. A communication segment associated with a communication is obtained by at least one processor of a generative language model. An output value associated with the communication segment is generated by the generative language model. The output value is mapped to a set of training bias values associated with the generative language model and based on the mapping of the output value to a training bias value of the set of training bias values, an alternative output value is generated. The alternative output value is used in a generated segment output for the communication segment. The accuracy of segment outputs generated by the generative language model is improved through reducing or eliminating its training biases.
-
公开(公告)号:US20240362418A1
公开(公告)日:2024-10-31
申请号:US18140658
申请日:2023-04-28
发明人: Baolin PENG , Michel GALLEY , Hao CHENG , Pengcheng HE , Nguyen Hung BACH , Weizhu CHEN , Jianfeng GAO
IPC分类号: G06F40/40 , G06F16/332
CPC分类号: G06F40/40 , G06F16/3325
摘要: A technique supplements a language model with knowledge information retrieved from external sources. The technique operates by: receiving a query; receiving knowledge information based on the query; generating original model-input information that includes the query and the knowledge information; and presenting the original model-input information to the language model. The technique further includes: receiving an original response from the language model; generating a usefulness measure that identifies usefulness of the original response; and determining whether the usefulness measure satisfies a prescribed test. Upon determining that the usefulness measure does not satisfy the test, the technique includes: generating revised model-input information that includes feedback information; presenting the revised model-input information to the language model; and receiving a revised response from the language model. According to some implementations, the technique eliminates or reduces artificial hallucination exhibited by the language model.
-
公开(公告)号:US20240046037A1
公开(公告)日:2024-02-08
申请号:US18268699
申请日:2020-12-25
发明人: Jian JIAO , Yeyun GONG , Nan DUAN , Weizhu CHEN , Kewen TANG , Qiang LOU , Ruofei ZHANG , Yu YAN , Jiusheng CHEN
IPC分类号: G06F40/284 , G06F40/40
CPC分类号: G06F40/284 , G06F40/40
摘要: Systems and methods are provided for training a data model based on training data. The training includes pre-training and fine-tuning the data model based on a combination of an autoregressive (AR) model and a non-autoregressive (NAR) model. Training data may be received and encoded into streams of tokens. A pre-trainer during decoding generates a continuum of data structures of the AR and NAR combined model including a main stream and a series of predicting streams. Masked tokens in predicting streams reference or attend to one or more preceding tokens in the main stream or the preceding predicting streams. A fine-tuner selects streams to generate a trained model according to a target data model. The target data model is determined based on balancing an accuracy constraint and an efficiency constraint for predicting tokens. The decoder acts as abridge between the AR and NAR models in generating a trained data model.
-
公开(公告)号:US20230153532A1
公开(公告)日:2023-05-18
申请号:US17664031
申请日:2022-05-18
发明人: Pengcheng HE , Jianfeng GAO , Weizhu CHEN
IPC分类号: G06F40/284 , G06F40/295 , G06N3/08 , G06N5/04
CPC分类号: G06F40/284 , G06F40/295 , G06N3/08 , G06N5/04
摘要: A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.
-
公开(公告)号:US20230119613A1
公开(公告)日:2023-04-20
申请号:US17505531
申请日:2021-10-19
发明人: Zeqi LIN , Yu HU , Haiyuan CAO , Yi LIU , Jian-Guang LOU , Kuralmani ELANGO , PalaniRaj KALIYAPERUMAL , Weizhu CHEN , Kunal MUKERJEE
IPC分类号: G06F40/35 , G06F40/211 , G06F40/186 , G06N20/00
摘要: Examples described herein generate training data for machine learning (ML) for natural language (NL) processing (such as semantic parsing for translating NL). A formula tree is generated based on sampling both a formula grammar and NL templates. Using the formula tree, an ML training data instance pair is generated comprising a formula example and an NL example. A context example may also be used during instantiation of the formula tree. An ML model is trained with training data including the ML training data instance pair, and ML output is generated from NL input. The ML output includes, for example, a machine-interpretable formula, a database querying language command, or a general programming language instruction. Some examples support context-free grammar, probabilistic context-free grammar, and/or non-context-free production rules.
-
公开(公告)号:US20240346295A1
公开(公告)日:2024-10-17
申请号:US18654691
申请日:2024-05-03
发明人: Weizhu CHEN , Pengcheng HE , Xiaodong LIU , Jianfeng GAO
摘要: This document relates to architectures and training procedures for multi-task machine learning models, such as neural networks. One example method involves providing a multi-task machine learning model having one or more shared layers and two or more task-specific layers. The method can also involve performing a pretraining stage on the one or more shared layers using one or more unsupervised prediction tasks. The method can also involve performing a tuning stage on the one or more shared layers and the two or more task-specific layers using respective task-specific objectives
-
公开(公告)号:US20210142181A1
公开(公告)日:2021-05-13
申请号:US16775635
申请日:2020-01-29
发明人: Xiaodong LIU , Jianfeng GAO , Pengcheng HE , Weizhu CHEN
摘要: This document relates to training of machine learning models such as neural networks. One example method involves providing a machine learning model having one or more layers and associated parameters and performing a pretraining stage on the parameters of the machine learning model to obtain pretrained parameters. The example method also involves performing a tuning stage on the machine learning model by using labeled training samples to tune the pretrained parameters. The tuning stage can include performing noise adjustment of the labeled training examples to obtain noise-adjusted training samples. The tuning stage can also include adjusting the pretrained parameters based at least on the labeled training examples and the noise-adjusted training examples to obtain adapted parameters. The example method can also include outputting a tuned machine learning model having the adapted parameters.
-
公开(公告)号:US20210117448A1
公开(公告)日:2021-04-22
申请号:US16659017
申请日:2019-10-21
发明人: Shean WANG , Jiayuan HUANG , Weizhu CHEN , Changhong YUAN , Ankit SARAF , Xiaoying GUO , Eslam K. ABDELREHEEM , Yunjing MA , Yuantao WANG , Justin Carl WONG , Nan ZHAO , Chao LI , Tsuyoshi WATANABE , Jaclyn Ruth Elizabeth PHILLIPS
IPC分类号: G06F16/28
摘要: In some examples, iterative sampling based dataset clustering may include sampling a dataset that includes a plurality of items to identify a specified number of sampled items. The sampled items may be clustered to generate a plurality of clusters. Un-sampled items may be assigned from the plurality of items to the clusters. Remaining un-sampled items that are not assigned to the clusters may be identified. A ratio associated with the remaining un-sampled items and the plurality of items may be compared to a specified threshold. Based on a determination that the ratio is greater than the specified threshold, an indication of completion of clustering of the plurality of items may be generated.
-
9.
公开(公告)号:US20230222295A1
公开(公告)日:2023-07-13
申请号:US18078530
申请日:2022-12-09
发明人: Pengcheng HE , Xiaodong LIU , Jianfeng GAO , Weizhu CHEN
摘要: Systems and methods are provided for facilitating the building and use of natural language understanding models. The systems and methods identify a plurality of tokens and use them to generate one or more pre-trained natural language models using a transformer. The transformer disentangles the content embedding and positional embedding in the computation of its attention matrix. Systems and methods are also provided to facilitate self-training of the pre-trained natural language model by utilizing multi-step decoding to better reconstruct masked tokens and improve pre-training convergence.
-
10.
公开(公告)号:US20210334475A1
公开(公告)日:2021-10-28
申请号:US16910508
申请日:2020-06-24
发明人: Pengcheng HE , Xiaodong LIU , Jianfeng GAO , Weizhu CHEN
摘要: Systems and methods are provided for facilitating the building and use of natural language understanding models. The systems and methods identify a plurality of tokens and use them to generate one or more pre-trained natural language models using a transformer. The transformer disentangles the content embedding and positional embedding in the computation of its attention matrix. Systems and methods are also provided to facilitate self-training of the pre-trained natural language model by utilizing multi-step decoding to better reconstruct masked tokens and improve pre-training convergence.
-
-
-
-
-
-
-
-
-