-
公开(公告)号:US11334792B2
公开(公告)日:2022-05-17
申请号:US16403388
申请日:2019-05-03
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adria Puigdomenech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
公开(公告)号:US10664753B2
公开(公告)日:2020-05-26
申请号:US16445523
申请日:2019-06-19
Applicant: DeepMind Technologies Limited
Abstract: A method includes maintaining respective episodic memory data for each of multiple actions; receiving a current observation characterizing a current state of an environment being interacted with by an agent; processing the current observation using an embedding neural network in accordance with current values of parameters of the embedding neural network to generate a current key embedding for the current observation; for each action of the plurality of actions: determining the p nearest key embeddings in the episodic memory data for the action to the current key embedding according to a distance measure, and determining a Q value for the action from the return estimates mapped to by the p nearest key embeddings in the episodic memory data for the action; and selecting, using the Q values for the actions, an action from the multiple actions as the action to be performed by the agent.
-
公开(公告)号:US20190303764A1
公开(公告)日:2019-10-03
申请号:US16445523
申请日:2019-06-19
Applicant: DeepMind Technologies Limited
IPC: G06N3/08
Abstract: A method includes maintaining respective episodic memory data for each of multiple actions; receiving a current observation characterizing a current state of an environment being interacted with by an agent; processing the current observation using an embedding neural network in accordance with current values of parameters of the embedding neural network to generate a current key embedding for the current observation; for each action of the plurality of actions: determining the p nearest key embeddings in the episodic memory data for the action to the current key embedding according to a distance measure, and determining a Q value for the action from the return estimates mapped to by the p nearest key embeddings in the episodic memory data for the action; and selecting, using the Q values for the actions, an action from the multiple actions as the action to be performed by the agent.
-
公开(公告)号:US20190258929A1
公开(公告)日:2019-08-22
申请号:US16403388
申请日:2019-05-03
Applicant: DeepMind Technologies Limited
Inventor: Volodymyr Mnih , Adria Puigdomenech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.
-
-
-