Patent search ap:("Google Inc.") AND inv:"Shixiang Gu" Page 1

1.

发明申请
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES 审中-公开

公开(公告)号：US20170228662A1

公开(公告)日：2017-08-10

申请号：US15429088

申请日：2017-02-09

Applicant: Google Inc.

Inventor： Shixiang Gu , Timothy Paul Lillicrap , Ilya ISutskever , Sergey Vladimir Levine

IPC: G06N99/00 , G06N7/00

CPC classification number: G06N3/0427 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

Patent Agency Ranking