REINFORCEMENT LEARNING USING AGENT CURRICULA

    公开(公告)号:US20190354867A1

    公开(公告)日:2019-11-21

    申请号:US16417522

    申请日:2019-05-20

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using agent curricula. One of the methods includes maintaining data specifying plurality of candidate agent policy neural networks; initializing mixing data that assigns a respective weight to each of the candidate agent policy neural networks; training the candidate agent policy neural networks using a reinforcement learning technique to generate combined action selection policies that result in improved performance on a reinforcement learning task; and during the training, repeatedly adjusting the weights in the mixing data to favor higher-performing candidate agent policy neural networks.

    Reinforcement learning using agent curricula

    公开(公告)号:US11113605B2

    公开(公告)日:2021-09-07

    申请号:US16417522

    申请日:2019-05-20

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using agent curricula. One of the methods includes maintaining data specifying plurality of candidate agent policy neural networks; initializing mixing data that assigns a respective weight to each of the candidate agent policy neural networks; training the candidate agent policy neural networks using a reinforcement learning technique to generate combined action selection policies that result in improved performance on a reinforcement learning task; and during the training, repeatedly adjusting the weights in the mixing data to favor higher-performing candidate agent policy neural networks.

    Selecting actions by reverting to previous learned action selection policies

    公开(公告)号:US11423300B1

    公开(公告)日:2022-08-23

    申请号:US16271533

    申请日:2019-02-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a system output using a remembered value of a neural network hidden state. In one aspect, a system comprises an external memory that maintains context experience tuples respectively comprising: (i) a key embedding of context data, and (ii) a value of a hidden state of a neural network at the respective previous time step. The neural network is configured to receive a system input and a remembered value of the hidden state of the neural network and to generate a system output. The system comprises a memory interface subsystem that is configured to determine a key embedding for current context data, determine a remembered value of the hidden state of the neural network based on the key embedding, and provide the remembered value of the hidden state as an input to the neural network.

Patent Agency Ranking