Invention Publication
- Patent Title: IMITATION LEARNING BASED ON PREDICTION OF OUTCOMES
-
Application No.: US18275722Application Date: 2022-02-04
-
Publication No.: US20240185082A1Publication Date: 2024-06-06
- Inventor: Andrew Coulter Jaegle , Yury Sulsky , Gregory Duncan Wayne , Robert David Fergus
- Applicant: DeepMind Technologies Limited
- Applicant Address: GB London
- Assignee: DeepMind Technologies Limited
- Current Assignee: DeepMind Technologies Limited
- Current Assignee Address: GB London
- International Application: PCT/EP2022/052792 2022.02.04
- Date entered country: 2023-08-03
- Main IPC: G06N3/092
- IPC: G06N3/092

Abstract:
A method is proposed of training a policy model to generate action data for controlling an agent to perform a task in an environment. The method comprises: obtaining, for each of a plurality of performances of the task, a corresponding demonstrator trajectory comprising a plurality of sets of state data characterizing the environment at each of a plurality of corresponding successive time steps during the performance of the task; using the demonstrator trajectories to generate a demonstrator model, the demonstrator model being operative to generate, for any said demonstrator trajectory, a value indicative of the probability of the demonstrator trajectory occurring; and jointly training an imitator model and a policy model. The joint training is performed by: generating a plurality of imitation trajectories, each imitation trajectory being generated by repeatedly receiving state data indicating a state of the environment, using the policy model to generate action data indicative of an action, and causing the action to be performed by the agent; training the imitator model using the imitation trajectories, the imitator model being operative to generate, for any said imitation trajectory, a value indicative of the probability of the imitation trajectory occurring; and training the policy model using a reward function which is a measure of the similarity of the demonstrator model and the imitator model.
Information query