Patent search ap:("DEEPMIND TECHNOLOGIES LIMITED") AND inv:"Abbas Abdolmaleki" Page 1

1.

发明公开
MULTI-OBJECTIVE REINFORCEMENT LEARNING USING WEIGHTED POLICY PROJECTION 审中-公开

公开(公告)号：US20240185084A1

公开(公告)日：2024-06-06

申请号：US18286504

申请日：2022-05-27

Applicant: DeepMind Technologies Limited

Inventor： Abbas Abdolmaleki , Sandy Han Huang , Martin Riedmiller

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: Computer implemented systems and methods for training an action selection policy neural network to select actions to be performed by an agent to control the agent to perform a task. The techniques are able to optimize multiple objectives one of which may be to stay close to a behavioral policy of a teacher. The behavioral policy of the teacher may be defined by a predetermined dataset of behaviors and the systems and methods may then learn offline. The described techniques provide a mechanism for explicitly defining a trade-off between the multiple objectives.

2.

发明申请
HIERARCHICAL POLICIES FOR MULTITASK TRANSFER 有权

公开(公告)号：US20220237488A1

公开(公告)日：2022-07-28

申请号：US17613687

申请日：2020-05-22

Applicant: DeepMind Technologies Limited

Inventor： Markus Wulfmeier , Abbas Abdolmaleki , Roland Hafner , Jost Tobias Springenberg , Nicolas Manfred Otto Heess , Martin Riedmiller

IPC: G06N7/00 , G06N3/04 , G06N20/20

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes obtaining an observation characterizing a current state of the environment and data identifying a task currently being performed by the agent; processing the observation and the data identifying the task using a high-level controller to generate a high-level probability distribution that assigns a respective probability to each of a plurality of low-level controllers; processing the observation using each of the plurality of low-level controllers to generate, for each of the plurality of low-level controllers, a respective low-level probability distribution; generating a combined probability distribution; and selecting, using the combined probability distribution, an action from the space of possible actions to be performed by the agent in response to the observation.

3.

发明公开
PLANNING USING A JUMPY TRAJECTORY DECODER NEURAL NETWORK 审中-公开

公开(公告)号：US20240220795A1

公开(公告)日：2024-07-04

申请号：US18401226

申请日：2023-12-29

Applicant: DeepMind Technologies Limited

Inventor： Jingwei Zhang , Arunkumar Byravan , Jost Tobias Springenberg , Martin Riedmiller , Nicolas Manfred Otto Heess , Leonard Hasenclever , Abbas Abdolmaleki , Dushyant Rao

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using jumpy trajectory decoder neural networks.

4.

发明公开
CONSTRAINED REINFORCEMENT LEARNING NEURAL NETWORK SYSTEMS USING PARETO FRONT OPTIMIZATION 审中-公开

公开(公告)号：US20230368037A1

公开(公告)日：2023-11-16

申请号：US18029992

申请日：2021-10-01

Applicant: DeepMind Technologies Limited

Inventor： Sandy Han Huang , Abbas Abdolmaleki

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: A system and method that controls an agent to perform a task subject to one or more constraints. The system trains a preference neural network that learns which preferences produce constraint-satisfying action selection policies. Thus the system optimizes a hierarchical policy that is a product of a preference policy and a preference-conditioned action selection policy. Thus the system learns to jointly optimize a set of objectives relating to rewards and costs received during the task whilst also learning preferences, i.e. trade-offs between the rewards and costs, that are most likely to produce policies that satisfy the constraints.

5.

发明申请
TRAINING MULTI-OBJECTIVE NEURAL NETWORK REINFORCEMENT LEARNING SYSTEMS 有权

公开(公告)号：US20230082326A1

公开(公告)日：2023-03-16

申请号：US17797203

申请日：2021-02-08

Applicant: DeepMind Technologies Limited

Inventor： Abbas Abdolmaleki , Sandy Han Huang

IPC: G06N3/08

Abstract: There is provided a method for training a neural network system by reinforcement learning, the neural network system being configured to receive an input observation characterizing a state of an environment interacted with by an agent and to select and output an action in accordance with a policy that aims to satisfy a plurality of objectives. The method comprises obtaining a set of one or more trajectories. Each trajectory comprises a state of an environment, an action applied by the agent to the environment according to a previous policy in response to the state, and a set of rewards for the action, each reward relating to a corresponding objective of the plurality of objectives. The method further comprises determining an action-value function for each of the plurality of objectives based on the set of one or more trajectories. Each action-value function determines an action value representing an estimated return according to the corresponding objective that would result from the agent performing a given action in response to a given state according to the previous policy. The method further comprises determining an updated policy based on a combination of the action-value functions for the plurality of objectives.

6.

发明申请
ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION 有权

公开(公告)号：US20220343157A1

公开(公告)日：2022-10-27

申请号：US17620164

申请日：2020-06-17

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Daniel J. Mankowitz , Nir Levine , Rae Chan Jeong , Abbas Abdolmaleki , Jost Tobias Springenberg , Todd Andrew Hester , Timothy Arthur Mann , Martin Riedmiller

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes sampling a mini-batch comprising one or more observation-action-reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of the Q network parameters by minimizing a robust entropy-regularized temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation-action-reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation-action-reward tuples.

7.

发明授权
Robot control policy determination through constrained optimization for smooth continuous control 有权

公开(公告)号：US10786900B1

公开(公告)日：2020-09-29

申请号：US16586846

申请日：2019-09-27

Applicant: DeepMind Technologies Limited

Inventor： Steven Bohez , Abbas Abdolmaleki

IPC: B25J9/16 , G06F17/11 , G05B13/02 , G06N20/00

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for a vehicles or other robot through the performance of a reinforcement learning simulation of the robot.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification