TRAINING ACTION SELECTION NEURAL NETWORKS USING OFF-POLICY ACTOR CRITIC REINFORCEMENT LEARNING AND STOCHASTIC DUELING NEURAL NETWORKS
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
Information query
Patent Agency Ranking
0/0