Patent search ap:("Google LLC") AND inv:"Tian Lu" Page 1

1.

发明申请
DETERMINING CONTROL POLICIES BY MINIMIZING THE IMPACT OF DELUSION 有权

公开(公告)号：US20210383218A1

公开(公告)日：2021-12-09

申请号：US17289514

申请日：2019-10-29

Applicant: Google LLC

Inventor： Tian Lu , Dale Eric Schuurmans , Craig Edgar Boutilier

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for an agent interacting with an environment. One of the methods includes updating the control policy using policy-consistent backups using Q learning. To determine a policy-consistent backup, the system determining a policy-consistent backup for the control policy at the current observation—current action pair, comprising: for each of a plurality of actions in a set of possible actions that can be performed by the agent, identifying Q values assigned by the control policy to next observation—action pairs by the control policy and justified by at least one of the information sets; pruning, from the identified Q values, any Q values that are justified only by information sets that are not policy-class consistent; and determining, from the reward and only the identified Q values that were not pruned, the policy-consistent backup.

Patent Agency Ranking