-
公开(公告)号:US20220391687A1
公开(公告)日:2022-12-08
申请号:US17338093
申请日:2021-06-03
Applicant: Google LLC
Inventor: John Dalton Co-Reyes , Yingjie Miao , Daiyi Peng , Sergey Vladimir Levine , Quoc V. Le , Honglak Lee , Aleksandra Faust
IPC: G06N3/08 , G06F11/34 , G06F16/901
Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for generating and searching reinforcement learning algorithms. In some implementations, a computer-implemented system generates a sequence of candidate reinforcement learning algorithms. Each candidate reinforcement learning algorithm in the sequence is configured to receive an input environment state characterizing a state of an environment and to generate an output that specifies an action to be performed by an agent interacting with the environment. For each candidate reinforcement learning algorithm in the sequence, the system performs a performance evaluation for a set of a plurality of training environments. For each training environment, the system adjusts a set of environment-specific parameters of the candidate reinforcement learning algorithm by performing training of the candidate reinforcement learning algorithm to control a corresponding agent in the training environment. The system generates an environment-specific performance metric for the candidate reinforcement learning algorithm that measures a performance of the candidate reinforcement learning algorithm in controlling the corresponding agent in the training environment as a result of the training. After performing training in the set of training environments, the system generates a summary performance metric for the candidate reinforcement learning algorithm by combining the environment-specific performance metrics generated for the set of training environments. After evaluating each of the candidate reinforcement learning algorithms in the sequence, the system selects one or more output reinforcement learning algorithms from the sequence based on the summary performance metrics of the candidate reinforcement learning algorithms.
-
公开(公告)号:US20210334320A1
公开(公告)日:2021-10-28
申请号:US17280027
申请日:2019-09-27
Applicant: Google LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06N3/04 , G06F16/953
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
公开(公告)号:US20210182620A1
公开(公告)日:2021-06-17
申请号:US16717471
申请日:2019-12-17
Applicant: Google LLC
Inventor: Jie Tan , Sehoon Ha , Tingnan Zhang , Xinlei Pan , Brian Andrew Ichter , Aleksandra Faust
Abstract: A computer-implemented method is disclosed for training one or more machine-learned models. The method can include inputting a first image frame and a second image frame into a feature disentanglement model and receiving, as an output of the machine-learned feature disentanglement model, a state feature and a perspective feature. The method can include inputting the state feature and the perspective feature into a machine-learned decoder model and receiving, as an output of the machine-learned decoder model, the reconstructed image frame. The method can include comparing the reconstructed image frame with a third image frame corresponding with the location and the perspective orientation. The method can include adjusting one or more parameters of the machine-learned feature disentanglement model based on the comparison of the reconstructed image frame and the third image frame.
-
公开(公告)号:US20250013881A1
公开(公告)日:2025-01-09
申请号:US18766415
申请日:2024-07-08
Applicant: Google LLC
Inventor: Yingjie Miao , John Dalton Co-Reyes , Esteban Alberto Real , George Jay Tucker , Aleksandra Faust
Abstract: Methods and systems for receiving training data for a machine learning (ML) task and searching, using the training data, for an optimized component of an ML algorithm for performing the ML task are described.
-
公开(公告)号:US20210325894A1
公开(公告)日:2021-10-21
申请号:US17275459
申请日:2019-09-13
Applicant: Google LLC
Inventor: Aleksandra Faust , Hao-tien Chiang , Anthony Francis , Marek Fiser
Abstract: Using reinforcement learning to train a policy network that can be utilized, for example, by a robot in performing robot navigation and/or other robotic tasks. Various implementations relate to techniques for automatically learning a reward function for training of a policy network through reinforcement learning, and automatically learning a neural network architecture for the policy network.
-
公开(公告)号:US12118052B2
公开(公告)日:2024-10-15
申请号:US18234766
申请日:2023-08-16
Applicant: GOOGLE LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06F16/953 , G06N3/04
CPC classification number: G06F16/954 , G06F16/953 , G06N3/04
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
公开(公告)号:US11941504B2
公开(公告)日:2024-03-26
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: G06N3/008 , B25J9/16 , B25J13/08 , G05B13/02 , G05D1/00 , G05D1/02 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/262 , G10L15/16 , G10L15/18 , G10L15/22 , G10L25/78
CPC classification number: G06N3/008 , B25J9/161 , B25J9/162 , B25J9/163 , B25J9/1697 , B25J13/08 , G05B13/027 , G05D1/0221 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/274 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L25/78 , G10L2015/223
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US20230394102A1
公开(公告)日:2023-12-07
申请号:US18234766
申请日:2023-08-16
Applicant: GOOGLE LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06F16/953 , G06N3/04
CPC classification number: G06F16/954 , G06F16/953 , G06N3/04
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
公开(公告)号:US20210086353A1
公开(公告)日:2021-03-25
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: B25J9/16 , G10L25/78 , G10L15/22 , G10L15/18 , G06K9/00 , G06K9/62 , G10L15/16 , G06T7/593 , G06K9/72 , B25J13/08 , G05D1/02 , G05B13/02 , G06N3/04
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US20250077603A1
公开(公告)日:2025-03-06
申请号:US18952242
申请日:2024-11-19
Applicant: GOOGLE LLC
Inventor: Aleksandra Faust , Dilek Hakkani-Tur , Izzeddin Gur , Ulrich Rueckert
IPC: G06F16/954 , G06F16/953 , G06N3/04
Abstract: The present disclosure is generally directed to methods, apparatus, and computer-readable media (transitory and non-transitory) for learning to automatically navigate interactive web documents and/or websites. More particularly, various approaches are presented for training various deep Q network (DQN) agents to perform various tasks associated with reinforcement learning, including hierarchical reinforcement learning, in challenging web navigation environments with sparse rewards and large state and action spaces. These agents include a web navigation agent that can use learned value function(s) to automatically navigate through interactive web documents, as well as a training agent, referred to herein as a “meta-trainer,” that can be trained to generate synthetic training examples. Some approaches described herein may be implemented when expert demonstrations are available. Other approaches described herein may be implemented when expert demonstrations are not available. In either case, dense, potential-based rewards may be used to augment the training.
-
-
-
-
-
-
-
-
-