Training neural networks using a prioritized experience memory

    公开(公告)号:US12086714B2

    公开(公告)日:2024-09-10

    申请号:US18103416

    申请日:2023-01-30

    CPC classification number: G06N3/08 G06N3/088 Y04S10/50

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

    Reinforcement learning with auxiliary tasks

    公开(公告)号:US11842281B2

    公开(公告)日:2023-12-12

    申请号:US17183618

    申请日:2021-02-24

    CPC classification number: G06N3/084 G06N3/006 G06N3/044 G06N3/045 G06N20/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.

    LEARNING NON-DIFFERENTIABLE WEIGHTS OF NEURAL NETWORKS USING EVOLUTIONARY STRATEGIES

    公开(公告)号:US20200234142A1

    公开(公告)日:2020-07-23

    申请号:US16751169

    申请日:2020-01-23

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. The neural network has a plurality of differentiable weights and a plurality of non-differentiable weights. One of the methods includes determining trained values of the plurality of differentiable weights and the non-differentiable weights by repeatedly performing operations that include determining an update to the current values of the plurality of differentiable weights using a machine learning gradient-based training technique and determining, using an evolution strategies (ES) technique, an update to the current values of a plurality of distribution parameters.

    Training neural networks using a prioritized experience memory

    公开(公告)号:US10650310B2

    公开(公告)日:2020-05-12

    申请号:US15349894

    申请日:2016-11-11

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

    META-LEARNED EVOLUTIONARY STRATEGIES OPTIMIZER

    公开(公告)号:US20240127071A1

    公开(公告)日:2024-04-18

    申请号:US18475859

    申请日:2023-09-27

    CPC classification number: G06N3/086

    Abstract: There is provided a computer-implemented method for updating a search distribution of an evolutionary strategies optimizer using an optimizer neural network comprising one or more attention blocks. The method comprises receiving a plurality of candidate solutions, one or more parameters defining the search distribution that the plurality of candidate solutions are sampled from, and fitness score data indicating a fitness of each respective candidate solution of the plurality of candidate solutions. The method further comprises processing, by the one or more attention neural network blocks, the fitness score data using an attention mechanism to generate respective recombination weights corresponding to each respective candidate solution. The method further comprises updating the one or more parameters defining the search distribution based upon the recombination weights applied to the plurality of candidate solutions.

    TRAINING NEURAL NETWORKS USING A PRIORITIZED EXPERIENCE MEMORY

    公开(公告)号:US20200265312A1

    公开(公告)日:2020-08-20

    申请号:US16866365

    申请日:2020-05-04

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

    TRAINING NEURAL NETWORKS USING A PRIORITIZED EXPERIENCE MEMORY

    公开(公告)号:US20250045583A1

    公开(公告)日:2025-02-06

    申请号:US18805367

    申请日:2024-08-14

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

Patent Agency Ranking