Patent search ap:("DeepMind Technologies Limited") AND inv:"Nan Ke" Page 1

1.

发明公开
RETRIEVAL AUGMENTED REINFORCEMENT LEARNING 审中-公开

公开(公告)号：US20240320506A1

公开(公告)日：2024-09-26

申请号：US18698890

申请日：2022-10-05

Applicant: DeepMind Technologies Limited

Inventor： Anirudh Goyal , Andrea Banino , Abram Luke Friesen , Theophane Guillaume Weber , Adrià Puigdomènech Badia , Nan Ke , Simon Osindero , Timothy Paul Lillicrap , Charles Blundell

IPC: G06N3/092 , G06N3/044 , G06N3/0455 , G06N3/084

CPC classification number: G06N3/092 , G06N3/044 , G06N3/0455 , G06N3/084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling a reinforcement learning agent in an environment to perform a task using a retrieval-augmented action selection process. One of the methods includes receiving a current observation characterizing a current state of the environment; processing an encoder network input comprising the current observation to determine a policy neural network hidden state that corresponds to the current observation; maintaining a plurality of trajectories generated as a result of the reinforcement learning agent interacting with the environment; selecting one or more trajectories from the plurality of trajectories; updating the policy neural network hidden state using update data determined from the one or more selected trajectories; and processing the updated hidden state using a policy neural network to generate a policy output that specifies an action to be performed by the agent in response to the current observation.

2.

发明公开
GENERATING ENVIRONMENT MODELS USING IN-CONTEXT ADAPTATION AND EXPLORATION 审中-公开

公开(公告)号：US20240256884A1

公开(公告)日：2024-08-01

申请号：US18424687

申请日：2024-01-26

Applicant: DeepMind Technologies Limited

Inventor： Hado Philip van Hasselt , Nan Ke , Chentian Jiang

IPC: G06N3/092 , G06N3/042

CPC classification number: G06N3/092 , G06N3/042

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment to perform a task. In one aspect, one of the methods include: maintaining context data; receiving a current observation characterizing a current state of the environment; generating a current graph model that represents the environment; selecting, from a possible set of actions and using the current graph model, a current action to be performed by the agent in response to the current observation; controlling the agent to perform the selected current action to cause the environment to transition from the current state into a new state; and updating the context data to include (i) data identifying the selected current action and (ii) a new observation characterizing the new state of the environment.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification