Patent search ap:("DeepMind Technologies Limited") AND inv:"János Kramár" Page 1

1.

发明申请
SELECTING POINTS IN CONTINUOUS SPACES USING NEURAL NETWORKS 有权

公开(公告)号：US20220374683A1

公开(公告)日：2022-11-24

申请号：US17668050

申请日：2022-02-09

Applicant: DeepMind Technologies Limited

Inventor： Thomas Edward Eccles , Ian Michael Gemp , János Kramár , Marta Garnelo Abellanas , Dan Rosenbaum , Yoram Bachrach , Thore Kurt Hartwig Graepel

IPC: G06N3/04 , G06K9/62

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting an optimal feature point in a continuous domain for a group of agents. A computer-implemented system obtains, for each of a plurality of agents, respective training data that comprises a respective utility score for each of a plurality of discrete points in the continuous domain. The system trains, for each of the plurality of agents and on the respective training data for the agents, a respective neural network that is configured to receive an input comprising a point in the continuous domain and to generate as output a predicted utility score for the agent at the point. And the system identifies the optimal point by optimizing an approximation of the shared outcome function that is defined by, for any given point in the continuous domain, a combination of the predicted utility scores generated by the respective neural networks for each of the plurality of agents by processing an input comprising the given point.

2.

发明公开
REINFORCEMENT AND IMITATION LEARNING FOR A TASK 审中-公开

公开(公告)号：US20230330848A1

公开(公告)日：2023-10-19

申请号：US18306711

申请日：2023-04-25

Applicant: DeepMind Technologies Limited

Inventor： Saran Tunyasuvunakool , Yuke Zhu , Joshua Merel , János Kramár , Ziyu Wang , Nicolas Manfred Otto Heess

IPC: B25J9/16 , G06N3/08 , G06N3/008 , G06N3/084 , G06N3/044 , G06N3/045

CPC classification number: B25J9/163 , G06N3/08 , B25J9/161 , B25J9/1697 , G06N3/008 , G06N3/084 , G06N3/044 , G06N3/045

Abstract: A neural network control system for controlling an agent to perform a task in a real-world environment, operates based on both image data and proprioceptive data describing the configuration of the agent. The training of the control system includes both imitation learning, using datasets generated from previous performances of the task, and reinforcement learning, based on rewards calculated from control data output by the control system.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification