-
公开(公告)号:US20230244907A1
公开(公告)日:2023-08-03
申请号:US18102985
申请日:2023-01-30
Applicant: DeepMind Technologies Limited
Inventor: Curtis Glenn-Macway Hawthorne , Andrew Coulter Jaegle , Catalina-Codruta Cangea , Sebastian Borgeaud Dit Avocat , Charlie Thomas Curtis Nash , Mateusz Malinowski , Sander Etienne Lea Dieleman , Oriol Vinyals , Matthew Botvinick , Ian Stuart Simon , Hannah Rachel Sheahan , Neil Zeghidour , Jean-Baptiste Alayrac , Joao Carreira , Jesse Engel
IPC: G06N3/044
CPC classification number: G06N3/044
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a sequence of data elements that includes a respective data element at each position in a sequence of positions. In one aspect, a method includes: for each position after a first position in the sequence of positions: obtaining a current sequence of data element embeddings that includes a respective data element embedding of each data element at a position that precedes the current position, obtaining a sequence of latent embeddings, and processing: (i) the current sequence of data element embeddings, and (ii) the sequence of latent embeddings, using a neural network to generate the data element at the current position. The neural network includes a sequence of neural network blocks including: (i) a cross-attention block, (ii) one or more self-attention blocks, and (iii) an output block.
-
公开(公告)号:US20220398437A1
公开(公告)日:2022-12-15
申请号:US17777131
申请日:2020-11-13
Applicant: DeepMind Technologies Limited
Inventor: Mateusz Malinowski , Viorica Patraucean , Grzegorz Michal Swirszcz , Joao Carreira
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing depth-parallel training of a neural network. One of the methods includes receiving an input sequence; and at each processing time step in a sequence of processing time steps: processing an input item using a first layer block in a stack of layer blocks to generate a first block output; for each subsequent layer block, processing a block output generated by the preceding layer block at the preceding processing time step to generate a current block output; computing i) a current error in an output item generated by the final layer block and ii) a current gradient of the current error; generating a parameter update for the final layer block; for each particular layer block that is not the final layer block, computing a current gradient for the particular layer block and generating a parameter update.
-