-
1.
公开(公告)号:US20230315532A1
公开(公告)日:2023-10-05
申请号:US18127551
申请日:2023-03-28
Applicant: DeepMind Technologies Limited
Inventor: Jordan Hoffmann , Sebastian Borgeaud Dit Avocat , Laurent Sifre , Arthur Mensch
IPC: G06F9/50
CPC classification number: G06F9/505 , G06F9/5016 , G06F9/5044 , G06F2209/501 , G06F2209/5022 , G06F2209/506
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model to perform a machine learning task. In one aspect, a method performed by one or more computer is described. The method includes: obtaining data defining a compute budget that characterizes an amount of computing resources allocated for training a machine learning model to perform a machine learning task; processing the data defining the compute budget using an allocation mapping, in accordance with a set of allocation mapping parameters, to generate an allocation tuple defining: (i) a target model size for the machine learning model, and (ii) a target amount of training data for training the machine learning model; instantiating the machine learning model, where the machine learning model has the target model size; and obtaining the target amount of training data for training the machine learning model.
-
公开(公告)号:US20230350936A1
公开(公告)日:2023-11-02
申请号:US18141337
申请日:2023-04-28
Applicant: DeepMind Technologies Limited
Inventor: Jean-Baptiste Alayrac , Jeffrey Donahue , Karel Lenc , Karen Simonyan , Malcolm Kevin Campbell Reynolds , Pauline Luc , Arthur Mensch , Iain Barr , Antoine Miech , Yana Elizabeth Hasson , Katherine Elizabeth Millican , Roman Ring
IPC: G06F16/432 , G06F40/284 , G06F16/438
CPC classification number: G06F16/432 , G06F16/438 , G06F40/284
Abstract: A query processing system is described which receives a query input comprising an input token string and also at least one data item having a second, different modality, and generates a corresponding output token string.
-
公开(公告)号:US20230177334A1
公开(公告)日:2023-06-08
申请号:US18076984
申请日:2022-12-07
Applicant: DeepMind Technologies Limited
Inventor: Sebastian Borgeaud Dit Avocat , Laurent Sifre , Arthur Mensch , Jordan Hoffmann
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a final output sequence. In one aspect, a method comprises: receiving a current output sequence comprising one or more current output segments; receiving a set of reference segments and a respective reference segment embedding of each reference segment that has been generated using an embedding neural network; for each current output segment: processing the current output segment using the embedding neural network to generate a current output segment embedding of the current output segment; and selecting k most similar reference segments to the current output segment using the reference segment embeddings and the current output segment embedding; and processing the current output sequence and the k most similar reference segments for each current output segment to generate an additional output segment that follows the current output sequence in the final output sequence.
-
公开(公告)号:US20240119261A1
公开(公告)日:2024-04-11
申请号:US18374447
申请日:2023-09-28
Applicant: DeepMind Technologies Limited
Inventor: Robin Strudel , Rémi Leblond , Laurent Sifre , Sander Etienne Lea Dieleman , Nikolay Savinov , Will S. Grathwohl , Corentin Tallec , Florent Altché , Iaroslav Ganin , Arthur Mensch , Yilin Du
IPC: G06N3/045
CPC classification number: G06N3/045
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of discrete tokens using a diffusion model. In one aspect, a method includes generating, by using the diffusion model, a final latent representation of the sequence of discrete tokens that includes a determined value for each of a plurality of latent variables; applying a de-embedding matrix to the final latent representation of the output sequence of discrete tokens to generate a de-embedded final latent representation that includes, for each of the plurality of latent variables, a respective numeric score for each discrete token in a vocabulary of multiple discrete tokens; selecting, for each of the plurality of latent variables, a discrete token from among the multiple discrete tokens in the vocabulary that has a highest numeric score; and generating the output sequence of discrete tokens that includes the selected discrete tokens.
-
公开(公告)号:US20230177309A1
公开(公告)日:2023-06-08
申请号:US18076978
申请日:2022-12-07
Applicant: DeepMind Technologies Limited
Inventor: Aidan Clark , Arthur Mensch
IPC: G06N3/04
CPC classification number: G06N3/0427
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network having one or more conditional computation layers, where each conditional computation layer includes a gating sub-layer having multiple gating parameters and an expert sub-layer having multiple expert neural networks. In one aspect, a method comprises: sampling a batch of target output sequences that comprises a respective ground truth output token at each of multiple output positions; for each target output sequence, processing the target output sequence using the neural network to generate a network output that includes respective score distributions over the vocabulary of output tokens for the output positions in the target output sequence; and training each gating sub-layer using respective rewards for the gating sub-layer for the output positions through reinforcement learning to optimize a reinforcement learning objective function that measures an expected reward received by the gating sub-layer.
-
-
-
-