-
公开(公告)号:US20240185082A1
公开(公告)日:2024-06-06
申请号:US18275722
申请日:2022-02-04
Applicant: DeepMind Technologies Limited
Inventor: Andrew Coulter Jaegle , Yury Sulsky , Gregory Duncan Wayne , Robert David Fergus
IPC: G06N3/092
CPC classification number: G06N3/092
Abstract: A method is proposed of training a policy model to generate action data for controlling an agent to perform a task in an environment. The method comprises: obtaining, for each of a plurality of performances of the task, a corresponding demonstrator trajectory comprising a plurality of sets of state data characterizing the environment at each of a plurality of corresponding successive time steps during the performance of the task; using the demonstrator trajectories to generate a demonstrator model, the demonstrator model being operative to generate, for any said demonstrator trajectory, a value indicative of the probability of the demonstrator trajectory occurring; and jointly training an imitator model and a policy model. The joint training is performed by: generating a plurality of imitation trajectories, each imitation trajectory being generated by repeatedly receiving state data indicating a state of the environment, using the policy model to generate action data indicative of an action, and causing the action to be performed by the agent; training the imitator model using the imitation trajectories, the imitator model being operative to generate, for any said imitation trajectory, a value indicative of the probability of the imitation trajectory occurring; and training the policy model using a reward function which is a measure of the similarity of the demonstrator model and the imitator model.
-
2.
公开(公告)号:US20250093828A1
公开(公告)日:2025-03-20
申请号:US18892260
申请日:2024-09-20
Applicant: DeepMind Technologies Limited
Inventor: Arun Ahuja , Robert David Fergus , Ishita Dasgupta , Kavya Venkata Kota Sai Kopparapu
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a high-level controller neural network for controlling an agent. In particular, the high-level controller neural network generates natural language commands that can be provided as input to a low-level controller neural network, which generates control outputs that can be used to control the agent.
-