-
公开(公告)号:US20250018562A1
公开(公告)日:2025-01-16
申请号:US18359550
申请日:2023-07-26
Applicant: GOOGLE LLC
Inventor: Fei Xia , Harris Chan , Brian Ichter , Wenlong Huang , Ted Xiao , Karol Hausman
Abstract: Some implementations related to using a large language model (LLM) in generating (and potentially refining) a plan for the execution of a long-horizon robotic task. Various implementations include processing, using the LLM, a free-form natural language instruction and textual feedback to generate LLM output. In many implementations, the free-form natural language instruction describes the robotic task. In additional or alternative implementations, the textual feedback can include task-specific feedback, passive scene description feedback, active scene description feedback, one or more additional or alternative types of environmental feedback, and/or combinations thereof. In some implementations, the system can select one or more robotic skills to perform based on the LLM output.
-
公开(公告)号:US20240189994A1
公开(公告)日:2024-06-13
申请号:US18539171
申请日:2023-12-13
Applicant: Google LLC
Inventor: Keerthana P G , Karol Hausman , Julian Ibarz , Brian Ichter , Alexander Irpan , Dmitry Kalashnikov , Yao Lu , Kanury Kanishka Rao , Michael Sahngwon Ryoo , Austin Charles Stone , Teddey Ming Xiao , Quan Ho Vuong , Sumedh Anand Sontakke
IPC: B25J9/16
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent interacting with an environment. In one aspect, a method comprises: receiving a natural language text sequence that characterizes a task to be performed by the agent in the environment; generating an encoded representation of the natural language text sequence; and at each of a plurality of time steps: obtaining an observation image characterizing a state of the environment at the time step; processing the observation image to generate an encoded representation of the observation image; generating a sequence of input tokens; processing the sequence of input tokens to generate a policy output that defines an action to be performed by the agent in response to the observation image; selecting an action to be performed by the agent using the policy output; and causing the agent to perform the selected action.
-
公开(公告)号:US20230311335A1
公开(公告)日:2023-10-05
申请号:US18128953
申请日:2023-03-30
Applicant: GOOGLE LLC
Inventor: Karol Hausman , Brian Ichter , Sergey Levine , Alexander Toshev , Fei Xia , Carolina Parada
CPC classification number: B25J13/003 , B25J11/0005 , B25J9/163 , B25J9/161 , G06F40/40
Abstract: Implementations process, using a large language model, a free-form natural language (NL) instruction to generate to generate LLM output. Those implementations generate, based on the LLM output and a NL skill description of a robotic skill, a task-grounding measure that reflects a probability of the skill description in the probability distribution of the LLM output. Those implementations further generate, based on the robotic skill and current environmental state data, a world-grounding measure that reflects a probability of the robotic skill being successful based on the current environmental state data. Those implementations further determine, based on both the task-grounding measure and the world-grounding measure, whether to implement the robotic skill.
-
-