Patent search ap:("DeepMind Technologies Limited") AND inv:"Ioannis Antonoglou" Page 1

1.

发明申请
PLANNING FOR AGENT CONTROL USING LEARNED HIDDEN STATES 有权

公开(公告)号：US20230073326A1

公开(公告)日：2023-03-09

申请号：US17794797

申请日：2021-01-28

Applicant: DeepMind Technologies Limited

Inventor： Julian Schrittwieser , Ioannis Antonoglou , Thomas Keisuke Hubert

IPC: G06N7/00 , G06N5/00 , G06K9/62

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions to be performed by an agent interacting with an environment to cause the agent to perform a task. One of the methods includes: receiving a current observation characterizing a current environment state of the environment; performing a plurality of planning iterations to generate plan data that indicates a respective value to performing the task of the agent performing each of the set of actions in the environment and starting from the current environment state, wherein performing each planning iteration comprises selecting a sequence of actions to be performed by the agent starting from the current environment state based on outputs generated by a dynamics model and a prediction model; and selecting, from the set of actions, an action to be performed by the agent in response to the current observation based on the plan data.

2.

发明公开
SEQUENCE-TO SEQUENCE NEURAL NETWORK SYSTEMS USING LOOK AHEAD TREE SEARCH 审中-公开

公开(公告)号：US20240104353A1

公开(公告)日：2024-03-28

申请号：US18274748

申请日：2022-02-08

Applicant: DeepMind Technologies Limited

Inventor： Rémi Bertrand Francis Leblond , Jean-Baptiste Alayrac , Laurent Sifre , Miruna Pîslar , Jean-Baptiste Lespiau , Ioannis Antonoglou , Karen Simonyan , David Silver , Oriol Vinyals

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: A computer-implemented method for generating an output token sequence from an input token sequence. The method combines a look ahead tree search, such as a Monte Carlo tree search, with a sequence-to-sequence neural network system. The sequence-to-sequence neural network system has a policy output defining a next token probability distribution, and may include a value neural network providing a value output to evaluate a sequence. An initial partial output sequence is extended using the look ahead tree search guided by the policy output and, in implementations, the value output, of the sequence-to-sequence neural network system until a complete output sequence is obtained.

Patent Agency Ranking