-
公开(公告)号:US20230244452A1
公开(公告)日:2023-08-03
申请号:US18105211
申请日:2023-02-02
Applicant: DeepMind Technologies Limited
Inventor: Yujia Li , David Hugo Choi , Junyoung Chung , Nathaniel Arthur Kushman , Julian Schrittwieser , Rémi Leblond , Thomas Edward Eccles , James Thomas Keeling , Felix Axel Gimeno Gil , Agustín Matías Dal Lago , Thomas Keisuke Hubert , Peter Choy , Cyprien de Masson d'Autume , Esme Sutherland Robson , Oriol Vinyals
IPC: G06F8/30
CPC classification number: G06F8/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating computer code using neural networks. One of the methods includes receiving description data describing a computer programming task; receiving a first set of inputs for the computer programming task; generating a plurality of candidate computer programs by sampling a plurality of output sequences from a set of one or more generative neural networks; for each candidate computer program in a subset of the candidate computer programs and for each input in the first set: executing the candidate computer program on the input to generate an output; and selecting, from the candidate computer programs, one or more computer programs as synthesized computer programs for performing the computer programming task based at least in part on the outputs generated by executing the candidate computer programs in the subset on the inputs in the first set of inputs.
-
公开(公告)号:US20240104391A1
公开(公告)日:2024-03-28
申请号:US18475743
申请日:2023-09-27
Applicant: DeepMind Technologies Limited
Inventor: Irina Higgins , Jonathan Ken Uesato , Nathaniel Arthur Kushman , Ramana Kumar
IPC: G06N3/092
CPC classification number: G06N3/092
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for A training a language model for performing a reasoning task. The system obtains a plurality of training examples. Each training example includes a respective sample query text sequence characterizing a respective sample query and a respective reference response text sequence that includes a reference final answer to the respective sample query. The system trains a reward model on the plurality of training examples. The reward model is configured to receive an input including a query text sequence characterizing a query and one or more reasoning steps that have been generated in response to the query and process the input to compute a reward score indicating how successful the one or more reasoning steps are in yielding a correct final answer to the query. The system trains the language model using the trained reward model.
-