-
公开(公告)号:US20230101930A1
公开(公告)日:2023-03-30
申请号:US17794780
申请日:2021-02-08
Applicant: DeepMind Technologies Limited
Inventor: Samuel Ritter , Ryan Faulkner , David Nunes Raposo
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment to accomplish a goal. In one aspect, a method comprises: generating a respective planning embedding corresponding to each of multiple experience tuples in an external memory, wherein each experience tuple characterizes interaction of the agent with the environment at a respective previous time step; processing the planning embeddings using a planning neural network to generate an implicit plan for accomplishing the goal; and selecting the action to be performed by the agent at the time step using the implicit plan.
-
公开(公告)号:US20240086703A1
公开(公告)日:2024-03-14
申请号:US18275542
申请日:2022-02-04
Applicant: DeepMind Technologies Limited
Inventor: Samuel Ritter , David Nunes Raposo
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: A computer-implemented reinforcement learning neural network system that learns a model of rewards in order to relate actions by an agent in an environment to their long-term consequences. The model learns to decompose the rewards into components explainable by different past states. That is, the model learns to associate when being in a particular state of the environment is predictive of a reward in a later state, even when the later state, and reward, is only achieved after a very long time delay.
-
公开(公告)号:US11423300B1
公开(公告)日:2022-08-23
申请号:US16271533
申请日:2019-02-08
Applicant: DeepMind Technologies Limited
Inventor: Samuel Ritter , Xiao Jing Wang , Siddhant Jayakumar , Razvan Pascanu , Charles Blundell , Matthew Botvinick
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a system output using a remembered value of a neural network hidden state. In one aspect, a system comprises an external memory that maintains context experience tuples respectively comprising: (i) a key embedding of context data, and (ii) a value of a hidden state of a neural network at the respective previous time step. The neural network is configured to receive a system input and a remembered value of the hidden state of the neural network and to generate a system output. The system comprises a memory interface subsystem that is configured to determine a key embedding for current context data, determine a remembered value of the hidden state of the neural network based on the key embedding, and provide the remembered value of the hidden state as an input to the neural network.
-
-