REINFORCEMENT LEARNING WITH AUXILIARY TASKS
    1.
    发明公开

    公开(公告)号:US20240144015A1

    公开(公告)日:2024-05-02

    申请号:US18386954

    申请日:2023-11-03

    CPC classification number: G06N3/084 G06N3/006 G06N3/044 G06N3/045 G06N20/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

    REINFORCEMENT LEARNING WITH AUXILIARY TASKS

    公开(公告)号:US20210182688A1

    公开(公告)日:2021-06-17

    申请号:US17183618

    申请日:2021-02-24

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

    Reinforcement learning with auxiliary tasks

    公开(公告)号:US10956820B2

    公开(公告)日:2021-03-23

    申请号:US16403385

    申请日:2019-05-03

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

    CONTROLLING AGENTS USING AMORTIZED Q LEARNING

    公开(公告)号:US20240160901A1

    公开(公告)日:2024-05-16

    申请号:US18406995

    申请日:2024-01-08

    CPC classification number: G06N3/047 G06N3/006 G06N3/084

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment. One of the methods includes receiving a current observation; processing the current observation using a proposal neural network to generate a proposal output that defines a proposal probability distribution over a set of possible actions that can be performed by the agent to interact with the environment; sampling (i) one or more actions from the set of possible actions in accordance with the proposal probability distribution and (ii) one or more actions randomly from the set of possible actions; processing the current observation and each sampled action using a Q neural network to generate a Q value; and selecting an action using the Q values generated by the Q neural network.

    Noisy neural network layers with noise parameters

    公开(公告)号:US11977983B2

    公开(公告)日:2024-05-07

    申请号:US17020248

    申请日:2020-09-14

    CPC classification number: G06N3/084 G06N3/044

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.

    Reinforcement learning with auxiliary tasks

    公开(公告)号:US11842281B2

    公开(公告)日:2023-12-12

    申请号:US17183618

    申请日:2021-02-24

    CPC classification number: G06N3/084 G06N3/006 G06N3/044 G06N3/045 G06N20/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

Patent Agency Ranking