Patent search caee:"DeepMind Technologies Limited" Page 5

41.

发明授权
Asynchronous deep reinforcement learning 有权

公开(公告)号：US11783182B2

公开(公告)日：2023-10-10

申请号：US17170316

申请日：2021-02-08

Applicant: DeepMind Technologies Limited

Inventor： Volodymyr Mnih , Adrià Puigdomènech Badia , Alexander Benjamin Graves , Timothy James Alexander Harley , David Silver , Koray Kavukcuoglu

IPC: G06N3/08 , G06N3/045 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04 , G06N3/045

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for asynchronous deep reinforcement learning. One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network.

42.

发明公开
TRAINING NEURAL NETWORKS 审中-公开

公开(公告)号：US20230316729A1

公开(公告)日：2023-10-05

申请号：US17711951

申请日：2022-04-01

Applicant: DeepMind Technologies Limited

Inventor： Dan-Andrei Calian , Sven Adrian Gowal , Timothy Arthur Mann , András György

IPC: G06V10/774 , G06V10/82 , G06V10/776

CPC classification number: G06V10/7747 , G06V10/82 , G06V10/776

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for processing a network input using a trained neural network with network parameters to generate an output for a machine learning task. The training includes: receiving a set of training examples each including a training network input and a reference output; for each training iteration, generating a corrupted network input for each training network input using a corruption neural network; updating perturbation parameters of the corruption neural network using a first objective function based on the corrupted network inputs; generating an updated corrupted network input for each training network input based on the updated perturbation parameters; and generating a network output for each updated corrupted network input using the neural network; for each training example, updating the network parameters using a second objective function based on the network output and the reference output.

43.

发明公开
OFF-LINE LEARNING FOR ROBOT CONTROL USING A REWARD PREDICTION MODEL 审中-公开

公开(公告)号：US20230256593A1

公开(公告)日：2023-08-17

申请号：US18018421

申请日：2021-07-27

Applicant: DeepMind Technologies Limited

Inventor： Konrad Zolna , Scott Ellison Reed

IPC: B25J9/16 , G06N3/092

CPC classification number: B25J9/161 , B25J9/163 , G06N3/092

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for off-line learning using a reward prediction model. One of the methods includes obtaining robot experience data; training, on a first subset of the robot experience data, a reward prediction model that receives a reward input comprising an input observation and generates as output a reward prediction that is a prediction Neural Network of a task-specific reward for the particular task that should be assigned to the input observation; processing experiences in the robot experience data using the trained reward prediction model to generate a respective reward prediction for each of the processed experiences; and training a policy neural network on (i) the processed experiences and (ii) the respective reward predictions for the processed experiences.

44.

发明授权
Neural episodic control 有权

公开(公告)号：US11720796B2

公开(公告)日：2023-08-08

申请号：US16856527

申请日：2020-04-23

Applicant: DeepMind Technologies Limited

Inventor： Benigno Uria-Martínez , Alexander Pritzel , Charles Blundell , Adrià Puigdomènech Badia

IPC: G06N3/084 , G06N3/006 , G06N3/08 , G06N3/04 , G06N3/044

CPC classification number: G06N3/084 , G06N3/006 , G06N3/08 , G06N3/04 , G06N3/044

Abstract: A method includes maintaining respective episodic memory data for each of multiple actions; receiving a current observation characterizing a current state of an environment being interacted with by an agent; processing the current observation using an embedding neural network in accordance with current values of parameters of the embedding neural network to generate a current key embedding for the current observation; for each action of the plurality of actions: determining the p nearest key embeddings in the episodic memory data for the action to the current key embedding according to a distance measure, and determining a Q value for the action from the return estimates mapped to by the p nearest key embeddings in the episodic memory data for the action; and selecting, using the Q values for the actions, an action from the multiple actions as the action to be performed by the agent.

45.

发明授权
Parallel execution of gated activation unit operations 有权

公开(公告)号：US11720781B2

公开(公告)日：2023-08-08

申请号：US16756363

申请日：2017-10-20

Applicant: DeepMind Technologies Limited

Inventor： Erich Konrad Elsen

IPC: G06N3/04 , G06N3/044 , G06N3/00 , G06N3/02 , G06F9/50

CPC classification number: G06N3/044 , G06F9/505 , G06F9/5066 , G06N3/00 , G06N3/02 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for interleaving matrix operations of a gated activation unit. One of the methods includes receiving a plurality of weight matrices of a gated activation unit of the neural network, the gated activation unit having two or more layers, each layer defining operations comprising: (i) a matrix operation between a weight matrix for the layer and concatenated input vectors and (ii) a nonlinear activation operation using a result of the matrix operation. Rows of the plurality of weight matrices are interleaved by assigning groups of corresponding rows to respective thread blocks, each thread block being a computation unit for execution by an independent processing unit of a plurality of independent processing units of a parallel processing device.

46.

发明公开
MULTI-AGENT REINFORCEMENT LEARNING WITH MATCHMAKING POLICIES 审中-公开

公开(公告)号：US20230244936A1

公开(公告)日：2023-08-03

申请号：US18131567

申请日：2023-04-06

Applicant: DeepMind Technologies Limited

Inventor： David Silver , Oriol Vinyals , Maxwell Elliot Jaderberg

IPC: G06N3/08 , H04L9/40 , G06F18/214

CPC classification number: G06N3/08 , H04L63/205 , G06F18/214

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environment. In one aspect, the method includes: maintaining data specifying a pool of candidate action selection policies; maintaining data specifying respective matchmaking policy; and training the policy neural network using a reinforcement learning technique to update the policy parameters. The policy parameters define policies to be used in controlling the agent to perform the particular task.

47.

发明公开
DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS 审中-公开

公开(公告)号：US20230196108A1

公开(公告)日：2023-06-22

申请号：US18169803

申请日：2023-02-15

Applicant: DeepMind Technologies Limited

Inventor： Georg Ostrovski , William Clinton Dabney

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

48.

发明授权
Performing navigation tasks using grid codes 有权

公开(公告)号：US11662210B2

公开(公告)日：2023-05-30

申请号：US17747144

申请日：2022-05-18

Applicant: DeepMind Technologies Limited

Inventor： Andrea Banino , Sudarshan Kumaran , Raia Thais Hadsell , Benigno Uria-Martínez

IPC: G01C21/20 , G06T7/73 , G06N3/04 , G06N3/08 , G06N3/06 , G06N3/082 , G06N3/006

CPC classification number: G01C21/20 , G06N3/04 , G06N3/0445 , G06N3/0454 , G06N3/08 , G06N3/082 , G06T7/73 , G06N3/006 , G06T2207/20081 , G06T2207/20084

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a grid cell neural network and an action selection neural network. The grid cell network is configured to: receive an input comprising data characterizing a velocity of the agent; process the input to generate a grid cell representation; and process the grid cell representation to generate an estimate of a position of the agent in the environment; the action selection neural network is configured to: receive an input comprising a grid cell representation and an observation characterizing a state of the environment; and process the input to generate an action selection network output.

49.

发明授权
Training action selection neural networks using a differentiable credit function 有权

公开(公告)号：US11651208B2

公开(公告)日：2023-05-16

申请号：US16615042

申请日：2018-05-22

Applicant: DEEPMIND TECHNOLOGIES LIMITED

Inventor： Zhongwen Xu , Hado Phillip van Hasselt , Joseph Varughese Modayil , Andre da Motta Salles Barreto , David Silver

IPC: G06N3/08 , G06N3/04

CPC classification number: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. A reinforcement learning neural network selects actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result. The reinforcement learning neural network has at least one input to receive an input observation characterizing a state of the environment and at least one output for determining an action to be performed by the agent in response to the input observation. The system includes a reward function network coupled to the reinforcement learning neural network. The reward function network has an input to receive reward data characterizing a reward provided by one or more states of the environment and is configured to determine a reward function to provide one or more target values for training the reinforcement learning neural network.

50.

发明公开
GENERATING NEURAL NETWORK OUTPUTS BY ENRICHING LATENT EMBEDDINGS USING SELF-ATTENTION AND CROSS-ATTENTION OPERATIONS 审中-公开

公开(公告)号：US20230145129A1

公开(公告)日：2023-05-11

申请号：US18095925

申请日：2023-01-11

Applicant: DeepMind Technologies Limited

Inventor： Andrew Coulter Jaegle , Joao Carreira

IPC: G06N3/092

CPC classification number: G06N3/092

Abstract: This specification describes a method for using a neural network to generate a network output that characterizes an entity. The method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output characterizing the entity. The neural network includes: (i) one or more cross-attention blocks, (ii) one or more self-attention blocks, and (iii) an output block. Each cross-attention block updates each latent embedding using attention over some or all of the data element embeddings. Each self-attention block updates each latent embedding using attention over the set of latent embeddings. The output block processes one or more latent embeddings to generate the network output that characterizes the entity.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification