-
公开(公告)号:US11941504B2
公开(公告)日:2024-03-26
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: G06N3/008 , B25J9/16 , B25J13/08 , G05B13/02 , G05D1/00 , G05D1/02 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/262 , G10L15/16 , G10L15/18 , G10L15/22 , G10L25/78
CPC classification number: G06N3/008 , B25J9/161 , B25J9/162 , B25J9/163 , B25J9/1697 , B25J13/08 , G05B13/027 , G05D1/0221 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/274 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L25/78 , G10L2015/223
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US20210086353A1
公开(公告)日:2021-03-25
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: B25J9/16 , G10L25/78 , G10L15/22 , G10L15/18 , G06K9/00 , G06K9/62 , G10L15/16 , G06T7/593 , G06K9/72 , B25J13/08 , G05D1/02 , G05B13/02 , G06N3/04
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US10424302B2
公开(公告)日:2019-09-24
申请号:US15782333
申请日:2017-10-12
Applicant: Google LLC
Inventor: Pararth Shah , Larry Paul Heck , Dilek Hakkani-Tur
Abstract: Techniques are described related to turn-based reinforcement learning for dialog management. In various implementations, dialog states and corresponding responsive actions generated during a multi-turn human-to-computer dialog session may be obtained. A plurality of turn-level training instances may be generated, each including: a given dialog state of the plurality of dialog states at an outset of a given turn of the human-to-computer dialog session; and a given responsive action that was selected based on the given dialog state. One or more of the turn-level training instances may further include a turn-level feedback value that reflects on the given responsive action selected during the given turn. A reward value may be generated based on an outcome of the human-to-computer dialog session. The dialog management policy model may be trained based on turn-level feedback values of the turn-level training instance(s) and the reward value.
-
公开(公告)号:US20240249109A1
公开(公告)日:2024-07-25
申请号:US18601159
申请日:2024-03-11
Applicant: GOOGLE LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: G06N3/008 , B25J9/16 , B25J13/08 , G05B13/02 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/262 , G10L15/16 , G10L15/18 , G10L15/22 , G10L25/78
CPC classification number: G06N3/008 , B25J9/161 , B25J9/162 , B25J9/163 , B25J9/1697 , B25J13/08 , G05B13/027 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/274 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L25/78 , G10L2015/223
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US11972339B2
公开(公告)日:2024-04-30
申请号:US17040299
申请日:2019-03-22
Applicant: Google LLC
Inventor: Pararth Shah , Dilek Hakkani-Tur , Juliana Kew , Marek Fiser , Aleksandra Faust
IPC: G06N3/008 , B25J9/16 , B25J13/08 , G05B13/02 , G05D1/00 , G05D1/02 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/262 , G10L15/16 , G10L15/18 , G10L15/22 , G10L25/78
CPC classification number: G06N3/008 , B25J9/161 , B25J9/162 , B25J9/163 , B25J9/1697 , B25J13/08 , G05B13/027 , G05D1/0221 , G06F18/21 , G06N3/044 , G06T7/593 , G06V20/10 , G06V30/274 , G10L15/16 , G10L15/1815 , G10L15/22 , G10L25/78 , G10L2015/223
Abstract: Implementations relate to using deep reinforcement learning to train a model that can be utilized, at each of a plurality of time steps, to determine a corresponding robotic action for completing a robotic task. Implementations additionally or alternatively relate to utilization of such a model in controlling a robot. The robotic action determined at a given time step utilizing such a model can be based on: current sensor data associated with the robot for the given time step, and free-form natural language input provided by a user. The free-form natural language input can direct the robot to accomplish a particular task, optionally with reference to one or more intermediary steps for accomplishing the particular task. For example, the free-form natural language input can direct the robot to navigate to a particular landmark, with reference to one or more intermediary landmarks to be encountered in navigating to the particular landmark.
-
公开(公告)号:US20190115027A1
公开(公告)日:2019-04-18
申请号:US15782333
申请日:2017-10-12
Applicant: Google LLC
Inventor: Pararth Shah , Larry Paul Heck , Dilek Hakkani-Tur
CPC classification number: G10L15/30 , G06F16/90332 , G06F17/27 , G10L15/005 , G10L15/16 , G10L15/22 , G10L17/22 , H04L51/02
Abstract: Techniques are described related to turn-based reinforcement learning for dialog management. In various implementations, dialog states and corresponding responsive actions generated during a multi-turn human-to-computer dialog session may be obtained. A plurality of turn-level training instances may be generated, each including: a given dialog state of the plurality of dialog states at an outset of a given turn of the human-to-computer dialog session; and a given responsive action that was selected based on the given dialog state. One or more of the turn-level training instances may further include a turn-level feedback value that reflects on the given responsive action selected during the given turn. A reward value may be generated based on an outcome of the human-to-computer dialog session. The dialog management policy model may be trained based on turn-level feedback values of the turn-level training instance(s) and the reward value.
-
-
-
-
-