ENVIRONMENT PREDICTION USING REINFORCEMENT LEARNING

    公开(公告)号:US20200327399A1

    公开(公告)日:2020-10-15

    申请号:US16911992

    申请日:2020-06-25

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

    Environment prediction using reinforcement learning

    公开(公告)号:US10733501B2

    公开(公告)日:2020-08-04

    申请号:US16403314

    申请日:2019-05-03

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

    CONTINUAL REINFORCEMENT LEARNING WITH A MULTI-TASK AGENT

    公开(公告)号:US20190244099A1

    公开(公告)日:2019-08-08

    申请号:US16268414

    申请日:2019-02-05

    CPC classification number: G06N3/08 G05D1/0088 G06N3/04

    Abstract: A method of training an action selection neural network for controlling an agent interacting with an environment to perform different tasks is described. The method includes obtaining a first trajectory of transitions generated while the agent was performing an episode of the first task from multiple tasks; and training the action selection neural network on the first trajectory to adjust the control policies for the multiple tasks. The training includes, for each transition in the first trajectory: generating respective policy outputs for the initial observation in the transition for each task in a subset of tasks that includes the first task and one other task; generating respective target policy outputs for each task using the reward in the transition, and determining an update to the current parameter values based on, for each task, a gradient of a loss between the policy output and the target policy output for the task.

    Continual reinforcement learning with a multi-task agent

    公开(公告)号:US12154029B2

    公开(公告)日:2024-11-26

    申请号:US16268414

    申请日:2019-02-05

    Abstract: A method of training an action selection neural network for controlling an agent interacting with an environment to perform different tasks is described. The method includes obtaining a first trajectory of transitions generated while the agent was performing an episode of the first task from multiple tasks; and training the action selection neural network on the first trajectory to adjust the control policies for the multiple tasks. The training includes, for each transition in the first trajectory: generating respective policy outputs for the initial observation in the transition for each task in a subset of tasks that includes the first task and one other task; generating respective target policy outputs for each task using the reward in the transition, and determining an update to the current parameter values based on, for each task, a gradient of a loss between the policy output and the target policy output for the task.

    Environment prediction using reinforcement learning

    公开(公告)号:US12141677B2

    公开(公告)日:2024-11-12

    申请号:US16911992

    申请日:2020-06-25

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

    ENVIRONMENT PREDICTION USING REINFORCEMENT LEARNING

    公开(公告)号:US20190259051A1

    公开(公告)日:2019-08-22

    申请号:US16403314

    申请日:2019-05-03

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.

Patent Agency Ranking