-
公开(公告)号:US20250005293A1
公开(公告)日:2025-01-02
申请号:US18217313
申请日:2023-06-30
Applicant: GOOGLE LLC
Inventor: Tuan Nguyen , Sergei Volnov , William A. Truong , Yunfan Ye , Sana Mithani , Neel Joshi , Alexey Galata , Tzu-Chan Chuang , Liang-yu Chen , Qiong Huang , Krunal Shah , Sai Aditya Chitturu
Abstract: Implementations relate to leveraging large language model(s) (LLMs) and vision language model(s) (VLMs) to facilitate human-to-computer dialogs. In various implementations, one or more digital images may be processed using one or more VLMs to generate VLM output indicative of a state of an environment. An LLM prompt may be assembled based on the VLM output and a natural language input. The LLM prompt may be processed using one or more LLMs to generate content that is responsive to the natural language input. The content that is responsive to the natural language input may subsequently be rendered at one or more output devices.
-
公开(公告)号:US20250078484A1
公开(公告)日:2025-03-06
申请号:US18242213
申请日:2023-09-05
Applicant: GOOGLE LLC
Inventor: Tuan Nguyen , Sergei Volnov , Yunfan Ye , Alexey Galata , William A. Truong , Tzu-Chan Chuang , Liang-yu Chen , Qiong Huang , Krunal Shah , Sai Aditya Chitturu , Sana Mithani
IPC: G06V10/80 , G06V40/16 , G10L15/183 , G10L15/30
Abstract: Implementations relate to generating and using multimodal embeddings. In various implementations, first modality data may be obtained and encoded into first modality embedding(s) using a trained first modality encoder that is stored in memory of edge-based client device(s). Second modality data may be obtained and encoded into second modality embedding(s) using a trained second modality encoder that is also stored in the memory of the edge-based client device(s). The first and second modality embeddings may be processed using an edge-based multimodal LLM that is also stored locally in memory of the edge-based client device(s) to generate a multimodal contextual embedding, which may be provided to a remote server that hosts a central LLM, e.g., in conjunction with a natural language input provided by the user. Information generated using the central LLM, responsive to the natural language input, may be received from the remote server.
-