REINFORCEMENT LEARNING USING AGENT CURRICULA
    11.
    发明申请

    公开(公告)号:US20190354867A1

    公开(公告)日:2019-11-21

    申请号:US16417522

    申请日:2019-05-20

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using agent curricula. One of the methods includes maintaining data specifying plurality of candidate agent policy neural networks; initializing mixing data that assigns a respective weight to each of the candidate agent policy neural networks; training the candidate agent policy neural networks using a reinforcement learning technique to generate combined action selection policies that result in improved performance on a reinforcement learning task; and during the training, repeatedly adjusting the weights in the mixing data to favor higher-performing candidate agent policy neural networks.

    Reinforcement learning using agent curricula

    公开(公告)号:US11113605B2

    公开(公告)日:2021-09-07

    申请号:US16417522

    申请日:2019-05-20

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using agent curricula. One of the methods includes maintaining data specifying plurality of candidate agent policy neural networks; initializing mixing data that assigns a respective weight to each of the candidate agent policy neural networks; training the candidate agent policy neural networks using a reinforcement learning technique to generate combined action selection policies that result in improved performance on a reinforcement learning task; and during the training, repeatedly adjusting the weights in the mixing data to favor higher-performing candidate agent policy neural networks.

    REINFORCEMENT LEARNING WITH AUXILIARY TASKS
    17.
    发明申请

    公开(公告)号:US20190258938A1

    公开(公告)日:2019-08-22

    申请号:US16403385

    申请日:2019-05-03

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

    REINFORCEMENT LEARNING WITH AUXILIARY TASKS
    19.
    发明公开

    公开(公告)号:US20240144015A1

    公开(公告)日:2024-05-02

    申请号:US18386954

    申请日:2023-11-03

    CPC classification number: G06N3/084 G06N3/006 G06N3/044 G06N3/045 G06N20/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

Patent Agency Ranking