REDUCING BIASES OF GENERATIVE LANGUAGE MODELS

    公开(公告)号:US20220392434A1

    公开(公告)日:2022-12-08

    申请号:US17342490

    申请日:2021-06-08

    Abstract: The disclosure herein describes reducing training bias in outputs generated by a generative language model. A communication segment associated with a communication is obtained by at least one processor of a generative language model. An output value associated with the communication segment is generated by the generative language model. The output value is mapped to a set of training bias values associated with the generative language model and based on the mapping of the output value to a training bias value of the set of training bias values, an alternative output value is generated. The alternative output value is used in a generated segment output for the communication segment. The accuracy of segment outputs generated by the generative language model is improved through reducing or eliminating its training biases.

    Interacting with a Language Model using External Knowledge and Feedback

    公开(公告)号:US20240362418A1

    公开(公告)日:2024-10-31

    申请号:US18140658

    申请日:2023-04-28

    CPC classification number: G06F40/40 G06F16/3325

    Abstract: A technique supplements a language model with knowledge information retrieved from external sources. The technique operates by: receiving a query; receiving knowledge information based on the query; generating original model-input information that includes the query and the knowledge information; and presenting the original model-input information to the language model. The technique further includes: receiving an original response from the language model; generating a usefulness measure that identifies usefulness of the original response; and determining whether the usefulness measure satisfies a prescribed test. Upon determining that the usefulness measure does not satisfy the test, the technique includes: generating revised model-input information that includes feedback information; presenting the revised model-input information to the language model; and receiving a revised response from the language model. According to some implementations, the technique eliminates or reduces artificial hallucination exhibited by the language model.

    GENERATION OF DATA MODELS FOR PREDICTING DATA

    公开(公告)号:US20240046037A1

    公开(公告)日:2024-02-08

    申请号:US18268699

    申请日:2020-12-25

    CPC classification number: G06F40/284 G06F40/40

    Abstract: Systems and methods are provided for training a data model based on training data. The training includes pre-training and fine-tuning the data model based on a combination of an autoregressive (AR) model and a non-autoregressive (NAR) model. Training data may be received and encoded into streams of tokens. A pre-trainer during decoding generates a continuum of data structures of the AR and NAR combined model including a main stream and a series of predicting streams. Masked tokens in predicting streams reference or attend to one or more preceding tokens in the main stream or the preceding predicting streams. A fine-tuner selects streams to generate a trained model according to a target data model. The target data model is determined based on balancing an accuracy constraint and an efficiency constraint for predicting tokens. The decoder acts as abridge between the AR and NAR models in generating a trained data model.

    LANGUAGE-MODEL PRETRAINING WITH GRADIENT-DISENTANGLED EMBEDDING SHARING

    公开(公告)号:US20230153532A1

    公开(公告)日:2023-05-18

    申请号:US17664031

    申请日:2022-05-18

    CPC classification number: G06F40/284 G06F40/295 G06N3/08 G06N5/04

    Abstract: A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.

    Automatically Labeling Items using a Machine-Trained Language Model

    公开(公告)号:US20250139380A1

    公开(公告)日:2025-05-01

    申请号:US18385358

    申请日:2023-10-30

    Abstract: A computer-implemented labeling technique generates a task description that describes a labeling task to be given to a language model. The technique then sends a prompt to the language model, which includes the task description and a particular item to be labeled. The technique receives a response provided by the language model in response to the prompt, which specifies a class assigned by the language model to the item. In some implementations, the task description specifies a group of suggested classes to be used in classifying the particular item. The task description also invites the language model to specify another class upon a finding that none of the group of suggested classes applies to the item. The technique also allows a user to stop and restart a labeling run at any point in the labeling run. Other aspects of the technique include consensus processing and weight updating.

    ADVERSARIAL TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:US20210142181A1

    公开(公告)日:2021-05-13

    申请号:US16775635

    申请日:2020-01-29

    Abstract: This document relates to training of machine learning models such as neural networks. One example method involves providing a machine learning model having one or more layers and associated parameters and performing a pretraining stage on the parameters of the machine learning model to obtain pretrained parameters. The example method also involves performing a tuning stage on the machine learning model by using labeled training samples to tune the pretrained parameters. The tuning stage can include performing noise adjustment of the labeled training examples to obtain noise-adjusted training samples. The tuning stage can also include adjusting the pretrained parameters based at least on the labeled training examples and the noise-adjusted training examples to obtain adapted parameters. The example method can also include outputting a tuned machine learning model having the adapted parameters.

    ADVERSARIAL TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:US20250165792A1

    公开(公告)日:2025-05-22

    申请号:US19034250

    申请日:2025-01-22

    Abstract: This document relates to training of machine learning models such as neural networks. One example method involves providing a machine learning model having one or more layers and associated parameters and performing a pretraining stage on the parameters of the machine learning model to obtain pretrained parameters. The example method also involves performing a tuning stage on the machine learning model by using labeled training samples to tune the pretrained parameters. The tuning stage can include performing noise adjustment of the labeled training examples to obtain noise-adjusted training samples. The tuning stage can also include adjusting the pretrained parameters based at least on the labeled training examples and the noise-adjusted training examples to obtain adapted parameters. The example method can also include outputting a tuned machine learning model having the adapted parameters.

Patent Agency Ranking