Continuous control with deep reinforcement learning

    公开(公告)号:US10776692B2

    公开(公告)日:2020-09-15

    申请号:US15217758

    申请日:2016-07-22

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.

    DATA EFFICIENT IMITATION OF DIVERSE BEHAVIORS

    公开(公告)号:US20200090042A1

    公开(公告)日:2020-03-19

    申请号:US16688934

    申请日:2019-11-19

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes: obtaining data identifying a set of trajectories, each trajectory comprising a set of observations characterizing a set of states of the environment and corresponding actions performed by another agent in response to the states; obtaining data identifying an encoder that maps the observations onto embeddings for use in determining a set of imitation trajectories; determining, for each trajectory, a corresponding embedding by applying the encoder to the trajectory; determining a set of imitation trajectories by applying a policy defined by the neural network to the embedding for each trajectory; and adjusting parameters of the neural network based on the set of trajectories, the set of imitation trajectories and the embeddings.

    DATA-EFFICIENT REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL TASKS

    公开(公告)号:US20190354813A1

    公开(公告)日:2019-11-21

    申请号:US16528260

    申请日:2019-07-31

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.

    NEURAL POPULATION LEARNING
    24.
    发明申请

    公开(公告)号:US20240412072A1

    公开(公告)日:2024-12-12

    申请号:US18422620

    申请日:2024-01-25

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment using a population of action selection policies that are jointly represented by a population action selection neural network. In one aspect, a method comprises, at each of a plurality of time steps: obtaining an observation characterizing a current state of the environment at the time step; selecting a target action selection policy from the population of action selection policies; processing a network input comprising: (i) the observation, and (ii) a strategy embedding representing the target action selection policy, using the population action selection neural network to generate an action selection output; and selecting an action to be performed by the agent at the time step using the action selection output.

    Selecting reinforcement learning actions using a low-level controller

    公开(公告)号:US11875258B1

    公开(公告)日:2024-01-16

    申请号:US17541186

    申请日:2021-12-02

    CPC classification number: G06N3/08 G06N3/006 G06N3/044 G06N3/045

    Abstract: Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.

    DATA-EFFICIENT REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL TASKS

    公开(公告)号:US20200285909A1

    公开(公告)日:2020-09-10

    申请号:US16882373

    申请日:2020-05-22

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data-efficient reinforcement learning. One of the systems is a system for training an actor neural network used to select actions to be performed by an agent that interacts with an environment by receiving observations characterizing states of the environment and, in response to each observation, performing an action selected from a continuous space of possible actions, wherein the actor neural network maps observations to next actions in accordance with values of parameters of the actor neural network, and wherein the system comprises: a plurality of workers, wherein each worker is configured to operate independently of each other worker, wherein each worker is associated with a respective agent replica that interacts with a respective replica of the environment during the training of the actor neural network.

Patent Agency Ranking