-
公开(公告)号:US20240412072A1
公开(公告)日:2024-12-12
申请号:US18422620
申请日:2024-01-25
Applicant: DeepMind Technologies Limited
Inventor: Siqi Liu , Luke Christopher Marris , Nicolas Manfred Otto Heess , Marc Lanctot
IPC: G06N3/092
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment using a population of action selection policies that are jointly represented by a population action selection neural network. In one aspect, a method comprises, at each of a plurality of time steps: obtaining an observation characterizing a current state of the environment at the time step; selecting a target action selection policy from the population of action selection policies; processing a network input comprising: (i) the observation, and (ii) a strategy embedding representing the target action selection policy, using the population action selection neural network to generate an action selection output; and selecting an action to be performed by the agent at the time step using the action selection output.