Patent search ap:("GOOGLE LLC") AND inv:"Danijar Hafner" Page 1

1.

发明申请
SAMPLE-EFFICIENT REINFORCEMENT LEARNING 有权

公开(公告)号：US20210201156A1

公开(公告)日：2021-07-01

申请号：US17056640

申请日：2019-05-20

Applicant: GOOGLE LLC

Inventor： Danijar Hafner , Jacob Buckman , Honglak Lee , Eugene Brevdo , George Jay Tucker

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sample-efficient reinforcement learning. One of the methods includes maintaining an ensemble of Q networks, an ensemble of transition models, and an ensemble of reward models; obtaining a transition; generating, using the ensemble of transition models, M trajectories; for each time step in each of the trajectories: generating, using the ensemble of reward models, N rewards for the time step, generating, using the ensemble of Q networks, L Q values for the time step, and determining, from the rewards, the Q values, and the training reward, L*N candidate target Q values for the trajectory and for the time step; for each of the time steps, combining the candidate target Q values; determining a final target Q value; and training at least one of the Q networks in the ensemble using the final target Q value.

2.

发明申请
BATCHED REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20200234117A1

公开(公告)日：2020-07-23

申请号：US16617461

申请日：2018-08-24

Applicant: GOOGLE LLC

Inventor： Danijar Hafner

IPC: G06N3/08 , G06N5/04 , G06F16/901

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for batched reinforcement learning. For example, the batched reinforcement learning techniques can be used to determine a control policy for a robot in simulation and the control policy can then be used to control the robot in the real world. In one aspect, a method includes obtaining a plurality of current observations, each current observation characterizing a current state of a respective environment replica; processing the current observations in parallel using the action selection neural network in accordance with current values of the network parameters to generate an action batch; obtaining a transition tuple batch comprising a respective transition tuple for each of the environment replicas, the respective transition tuple for each environment replica comprising: (i) a subsequent observation and (ii) a reward; and training the action selection neural network on the batch of transition tuples.

3.

发明公开
SYSTEM AND METHODS FOR PIXEL BASED MODEL PREDICTIVE CONTROL 审中-公开

公开(公告)号：US20240173854A1

公开(公告)日：2024-05-30

申请号：US18436684

申请日：2024-02-08

Applicant: GOOGLE LLC

Inventor： Danijar Hafner

IPC: B25J9/16 , G06N7/01

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661 , G06N7/01

Abstract: Techniques are disclosed that enable model predictive control of a robot based on a latent dynamics model and a reward function. In many implementations, the latent space can be divided into a deterministic portion and stochastic portion, allowing the model to be utilized in generating more likely robot trajectories. Additional or alternative implementations include many reward functions, where each reward function corresponds to a different robot task.

4.

发明申请
SYSTEM AND METHODS FOR PIXEL BASED MODEL PREDICTIVE CONTROL 有权

公开(公告)号：US20210205984A1

公开(公告)日：2021-07-08

申请号：US17056104

申请日：2019-05-17

Applicant: Google LLC

Inventor： Danijar Hafner

IPC: B25J9/16 , G06N7/00

Abstract: Techniques are disclosed that enable model predictive control of a robot based on a latent dynamics model and a reward function. In many implementations, the latent space can be divided into a deterministic portion and stochastic portion, allowing the model to be utilized in generating more likely robot trajectories. Additional or alternative implementations include many reward functions, where each reward function corresponds to a different robot task.

5.

发明授权
System and methods for pixel based model predictive control 有权

公开(公告)号：US11904467B2

公开(公告)日：2024-02-20

申请号：US17056104

申请日：2019-05-17

Applicant: Google LLC

Inventor： Danijar Hafner

IPC: B25J9/16 , G06N7/01

CPC classification number: B25J9/161 , B25J9/163 , B25J9/1661 , G06N7/01

Abstract: Techniques are disclosed that enable model predictive control of a robot based on a latent dynamics model and a reward function. In many implementations, the latent space can be divided into a deterministic portion and stochastic portion, allowing the model to be utilized in generating more likely robot trajectories. Additional or alternative implementations include many reward functions, where each reward function corresponds to a different robot task.

6.

发明申请
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE 有权

公开(公告)号：US20210158162A1

公开(公告)日：2021-05-27

申请号：US17103827

申请日：2020-11-24

Applicant: Google LLC

Inventor： Danijar Hafner , Mohammad Norouzi , Timothy Paul Lillicrap

IPC: G06N3/08 , G06K9/62 , G06F30/27

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network used to select an action to be performed by an agent interacting with an environment. In one aspect, a method includes: receiving a latent representation characterizing a current state of the environment; generating a trajectory of latent representations that starts with the received latent representation; for each latent representation in the trajectory: determining a predicted reward; and processing the state latent representation using a value neural network to generate a predicted state value; determining a corresponding target state value for each latent representation in the trajectory; determining, based on the target state values, an update to the current values of the policy neural network parameters; and determining an update to the current values of the value neural network parameters.

Patent Agency Ranking