-
公开(公告)号:US20220374683A1
公开(公告)日:2022-11-24
申请号:US17668050
申请日:2022-02-09
Applicant: DeepMind Technologies Limited
Inventor: Thomas Edward Eccles , Ian Michael Gemp , János Kramár , Marta Garnelo Abellanas , Dan Rosenbaum , Yoram Bachrach , Thore Kurt Hartwig Graepel
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting an optimal feature point in a continuous domain for a group of agents. A computer-implemented system obtains, for each of a plurality of agents, respective training data that comprises a respective utility score for each of a plurality of discrete points in the continuous domain. The system trains, for each of the plurality of agents and on the respective training data for the agents, a respective neural network that is configured to receive an input comprising a point in the continuous domain and to generate as output a predicted utility score for the agent at the point. And the system identifies the optimal point by optimizing an approximation of the shared outcome function that is defined by, for any given point in the continuous domain, a combination of the predicted utility scores generated by the respective neural networks for each of the plurality of agents by processing an input comprising the given point.
-
公开(公告)号:US20230330848A1
公开(公告)日:2023-10-19
申请号:US18306711
申请日:2023-04-25
Applicant: DeepMind Technologies Limited
Inventor: Saran Tunyasuvunakool , Yuke Zhu , Joshua Merel , János Kramár , Ziyu Wang , Nicolas Manfred Otto Heess
CPC classification number: B25J9/163 , G06N3/08 , B25J9/161 , B25J9/1697 , G06N3/008 , G06N3/084 , G06N3/044 , G06N3/045
Abstract: A neural network control system for controlling an agent to perform a task in a real-world environment, operates based on both image data and proprioceptive data describing the configuration of the agent. The training of the control system includes both imitation learning, using datasets generated from previous performances of the task, and reinforcement learning, based on rewards calculated from control data output by the control system.
-