Instruction Fine-Tuning Machine-Learned Models Using Intermediate Reasoning Steps

    公开(公告)号:US20240256965A1

    公开(公告)日:2024-08-01

    申请号:US18424624

    申请日:2024-01-26

    Applicant: Google LLC

    CPC classification number: G06N20/00

    Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.

    Pretraining Already-Pretrained Models for Diverse Downstream Tasks

    公开(公告)号:US20240256964A1

    公开(公告)日:2024-08-01

    申请号:US18424031

    申请日:2024-01-26

    Applicant: Google LLC

    CPC classification number: G06N20/00 G06F7/483

    Abstract: An example method includes obtaining a pretrained machine-learned model that was initially pretrained using a pretraining dataset and further pretraining the model by generating, using a pretraining objective framework, a plurality of corrupted training examples from one or more training examples obtained from the pretraining dataset. A first set of one or more training examples can be corrupted according to a first set of configuration parameters of the pretraining objective framework. A second set can be corrupted according to a second set of configuration parameters of the pretraining objective framework. The example method includes inputting the plurality of corrupted training examples into model; obtaining from the model, a plurality of outputs respectively generated by model based on the plurality of corrupted training examples; and updating one or more parameters of model based on an evaluation of the plurality of outputs.

    Machine-Learned Attention Models Featuring Echo-Attention Layers

    公开(公告)号:US20220245432A1

    公开(公告)日:2022-08-04

    申请号:US17592174

    申请日:2022-02-03

    Applicant: Google LLC

    Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.

Patent Agency Ranking