Patent search ap:("DeepMind Technologies Limited") AND inv:"Brendan Timothy O'Donoghue" Page 1

1.

发明公开
NEURAL NETWORK REINFORCEMENT LEARNING WITH DIVERSE POLICIES 审中-公开

公开(公告)号：US20240104389A1

公开(公告)日：2024-03-28

申请号：US18275511

申请日：2022-02-04

Applicant: DeepMind Technologies Limited

Inventor： Tom Ben Zion Zahavy , Brendan Timothy O'Donoghue , Andre da Motta Salles Barreto , Johan Sebastian Flennerhag , Volodymyr Mnih , Satinder Singh Baveja

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: In one aspect there is provided a method for training a neural network system by reinforcement learning. The neural network system may be configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy aiming to satisfy an objective. The method may comprise obtaining a policy set comprising one or more policies for satisfying the objective and determining a new policy based on the one or more policies. The determining may include one or more optimization steps that aim to maximize a diversity of the new policy relative to the policy set under the condition that the new policy satisfies a minimum performance criterion based on an expected return that would be obtained by following the new policy.

2.

发明公开
REINFORCEMENT LEARNING BY SOLUTION OF A CONVEX MARKOV DECISION PROCESS 审中-公开

公开(公告)号：US20240249151A1

公开(公告)日：2024-07-25

申请号：US18558894

申请日：2022-05-27

Applicant: DeepMind Technologies Limited

Inventor： Tom Ben Zion Zahavy , Brendan Timothy O'Donoghue , Guillaume Desjardins , Satinder Singh Baveja

IPC: G06N3/092 , G06N3/045

CPC classification number: G06N3/092 , G06N3/045

Abstract: The actions of an agent in an environment are selected using a policy model neural network which implements a policy model defining, for any observed state of the environment characterized by an observation received by the policy model neural network, a state-action distribution over the set of possible actions the agent can perform. The policy model neural network is jointly trained with a cost model neural network which, upon receiving an observation characterizing the environment, outputs a reward vector. The reward vector comprises a corresponding reward value for every possible action. The training involves a sequence of iterations, in each of which (a) a cost model is derived based on the state-action distribution of a candidate policy model defined in one or more previous iterations, and subsequently (b) a candidate policy model is obtained based on reward vector(s) defined by the cost model obtained in the iteration.

3.

发明公开
METHODS AND SYSTEMS FOR CONSTRAINED REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20240265263A1

公开(公告)日：2024-08-08

申请号：US18424437

申请日：2024-01-26

Applicant: DeepMind Technologies Limited

Inventor： Theodore Harris Moskovitz , Brendan Timothy O'Donoghue , Tom Ben Zion Zahavy , Johan Sebastian Flennerhag , Vivek Veeriah Jeya Veeraiah , Satinder Singh Baveja

IPC: G06N3/091

CPC classification number: G06N3/091

Abstract: A method is described for iteratively training a policy model, such as a neural network, of a computer-implemented action selection system to control an agent interacting with an environment to perform a task subject to one or more constraints. The task has a reward associated with performance of the task. Each constraint limits to a corresponding threshold the expected value of the total of a corresponding constraint function which if the future actions of the agent are chosen according to the policy model, and each constraint is associated with a corresponding multiplier variable. In each iteration, a mixed reward function is generated based on values for the multiplier variables generated in the preceding iteration, and estimates of the rewards and the values of constraint reward functions if the actions are chosen based on the policy model generated in the preceding iteration.

4.

发明公开
SOLVING MIXED INTEGER PROGRAMS USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20240062060A1

公开(公告)日：2024-02-22

申请号：US18267363

申请日：2021-12-20

Applicant: DeepMind Technologies Limited

Inventor： Sergey Bartunov , Felix Axel Gimeno Gil , Ingrid Karin von Glehn , Pawel Lichocki , Ivan Lobov , Vinod Nair , Brendan Timothy O'Donoghue , Nicolas Sonnerat , Christian Tjandraatmadja , Pengming Wang

IPC: G06N3/08 , G06F17/11

CPC classification number: G06N3/08 , G06F17/11

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for solving mixed integer programs (MIPs) using neural networks. One of the methods includes obtaining data specifying parameters of a MIP; generating, from the parameters of the MIP, an input representation; processing the input representation using an encoder neural network to generate a respective embedding for each of the integer variables; generating a plurality of partial assignments by selecting a respective second, proper subset of the integer variables; and for each of the variables in the respective second subset, generating, using at least the respective embedding for the variable, a respective additional constraint on the value of the variable; generating, for each of the partial assignments, a corresponding candidate final assignment that assigns a respective value to each of the plurality of variables; and selecting, as a final assignment for the MIP, one of the candidate final assignments.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification