REDUCING BIASES OF GENERATIVE LANGUAGE MODELS

    公开(公告)号:US20220392434A1

    公开(公告)日:2022-12-08

    申请号:US17342490

    申请日:2021-06-08

    IPC分类号: G10L15/06 G06N20/00

    摘要: The disclosure herein describes reducing training bias in outputs generated by a generative language model. A communication segment associated with a communication is obtained by at least one processor of a generative language model. An output value associated with the communication segment is generated by the generative language model. The output value is mapped to a set of training bias values associated with the generative language model and based on the mapping of the output value to a training bias value of the set of training bias values, an alternative output value is generated. The alternative output value is used in a generated segment output for the communication segment. The accuracy of segment outputs generated by the generative language model is improved through reducing or eliminating its training biases.

    Interacting with a Language Model using External Knowledge and Feedback

    公开(公告)号:US20240362418A1

    公开(公告)日:2024-10-31

    申请号:US18140658

    申请日:2023-04-28

    IPC分类号: G06F40/40 G06F16/332

    CPC分类号: G06F40/40 G06F16/3325

    摘要: A technique supplements a language model with knowledge information retrieved from external sources. The technique operates by: receiving a query; receiving knowledge information based on the query; generating original model-input information that includes the query and the knowledge information; and presenting the original model-input information to the language model. The technique further includes: receiving an original response from the language model; generating a usefulness measure that identifies usefulness of the original response; and determining whether the usefulness measure satisfies a prescribed test. Upon determining that the usefulness measure does not satisfy the test, the technique includes: generating revised model-input information that includes feedback information; presenting the revised model-input information to the language model; and receiving a revised response from the language model. According to some implementations, the technique eliminates or reduces artificial hallucination exhibited by the language model.

    GENERATION OF DATA MODELS FOR PREDICTING DATA

    公开(公告)号:US20240046037A1

    公开(公告)日:2024-02-08

    申请号:US18268699

    申请日:2020-12-25

    IPC分类号: G06F40/284 G06F40/40

    CPC分类号: G06F40/284 G06F40/40

    摘要: Systems and methods are provided for training a data model based on training data. The training includes pre-training and fine-tuning the data model based on a combination of an autoregressive (AR) model and a non-autoregressive (NAR) model. Training data may be received and encoded into streams of tokens. A pre-trainer during decoding generates a continuum of data structures of the AR and NAR combined model including a main stream and a series of predicting streams. Masked tokens in predicting streams reference or attend to one or more preceding tokens in the main stream or the preceding predicting streams. A fine-tuner selects streams to generate a trained model according to a target data model. The target data model is determined based on balancing an accuracy constraint and an efficiency constraint for predicting tokens. The decoder acts as abridge between the AR and NAR models in generating a trained data model.

    LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

    公开(公告)号:US20230153532A1

    公开(公告)日:2023-05-18

    申请号:US17664031

    申请日:2022-05-18

    摘要: A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.

    ADVERSARIAL TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:US20210142181A1

    公开(公告)日:2021-05-13

    申请号:US16775635

    申请日:2020-01-29

    IPC分类号: G06N3/08 G06N3/04

    摘要: This document relates to training of machine learning models such as neural networks. One example method involves providing a machine learning model having one or more layers and associated parameters and performing a pretraining stage on the parameters of the machine learning model to obtain pretrained parameters. The example method also involves performing a tuning stage on the machine learning model by using labeled training samples to tune the pretrained parameters. The tuning stage can include performing noise adjustment of the labeled training examples to obtain noise-adjusted training samples. The tuning stage can also include adjusting the pretrained parameters based at least on the labeled training examples and the noise-adjusted training examples to obtain adapted parameters. The example method can also include outputting a tuned machine learning model having the adapted parameters.