-
11.
公开(公告)号:US20240403636A1
公开(公告)日:2024-12-05
申请号:US18697257
申请日:2022-10-05
Applicant: GOOGLE LLC
Inventor: Valerii Likhosherstov , Mostafa Dehghani , Anurag Arnab , Krzysztof Marcin Choromanski , Mario Lucic , Yi Tay
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing and training a multi-modal, multi-task self-attention neural network.
-
公开(公告)号:US20240256965A1
公开(公告)日:2024-08-01
申请号:US18424624
申请日:2024-01-26
Applicant: Google LLC
Inventor: Hyung Won Chung , Barret Zoph , Dengyong Zhou , Liam Fedus , Shayne Longpre , Le Hou , Yi Tay , Jason Weng Wei , Siddhartha Brahma , Quoc V. Le
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.
-
公开(公告)号:US20240256964A1
公开(公告)日:2024-08-01
申请号:US18424031
申请日:2024-01-26
Applicant: Google LLC
Inventor: Yi Tay , Mostafa Dehghani
Abstract: An example method includes obtaining a pretrained machine-learned model that was initially pretrained using a pretraining dataset and further pretraining the model by generating, using a pretraining objective framework, a plurality of corrupted training examples from one or more training examples obtained from the pretraining dataset. A first set of one or more training examples can be corrupted according to a first set of configuration parameters of the pretraining objective framework. A second set can be corrupted according to a second set of configuration parameters of the pretraining objective framework. The example method includes inputting the plurality of corrupted training examples into model; obtaining from the model, a plurality of outputs respectively generated by model based on the plurality of corrupted training examples; and updating one or more parameters of model based on an evaluation of the plurality of outputs.
-
公开(公告)号:US20220245432A1
公开(公告)日:2022-08-04
申请号:US17592174
申请日:2022-02-03
Applicant: Google LLC
Inventor: Yi Tay , Donald Arthur Metzler, JR. , Dara Bahri , Mostafa Dehghani
Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.
-
-
-