-
公开(公告)号:US20250005293A1
公开(公告)日:2025-01-02
申请号:US18217313
申请日:2023-06-30
Applicant: GOOGLE LLC
Inventor: Tuan Nguyen , Sergei Volnov , William A. Truong , Yunfan Ye , Sana Mithani , Neel Joshi , Alexey Galata , Tzu-Chan Chuang , Liang-yu Chen , Qiong Huang , Krunal Shah , Sai Aditya Chitturu
Abstract: Implementations relate to leveraging large language model(s) (LLMs) and vision language model(s) (VLMs) to facilitate human-to-computer dialogs. In various implementations, one or more digital images may be processed using one or more VLMs to generate VLM output indicative of a state of an environment. An LLM prompt may be assembled based on the VLM output and a natural language input. The LLM prompt may be processed using one or more LLMs to generate content that is responsive to the natural language input. The content that is responsive to the natural language input may subsequently be rendered at one or more output devices.