Patent search ap:("GOOGLE LLC") AND inv:"Liang-yu Chen" Page 1

1.

发明申请
MULTIMODAL DIALOGS USING LARGE LANGUAGE MODEL(S) AND VISUAL LANGUAGE MODEL(S) 有权

公开(公告)号：US20250005293A1

公开(公告)日：2025-01-02

申请号：US18217313

申请日：2023-06-30

Applicant: GOOGLE LLC

Inventor： Tuan Nguyen , Sergei Volnov , William A. Truong , Yunfan Ye , Sana Mithani , Neel Joshi , Alexey Galata , Tzu-Chan Chuang , Liang-yu Chen , Qiong Huang , Krunal Shah , Sai Aditya Chitturu

IPC: G06F40/40 , G06F40/30

Abstract: Implementations relate to leveraging large language model(s) (LLMs) and vision language model(s) (VLMs) to facilitate human-to-computer dialogs. In various implementations, one or more digital images may be processed using one or more VLMs to generate VLM output indicative of a state of an environment. An LLM prompt may be assembled based on the VLM output and a natural language input. The LLM prompt may be processed using one or more LLMs to generate content that is responsive to the natural language input. The content that is responsive to the natural language input may subsequently be rendered at one or more output devices.

2.

发明申请
MULTIMODAL EMBEDDINGS 有权

公开(公告)号：US20250078484A1

公开(公告)日：2025-03-06

申请号：US18242213

申请日：2023-09-05

Applicant: GOOGLE LLC

Inventor： Tuan Nguyen , Sergei Volnov , Yunfan Ye , Alexey Galata , William A. Truong , Tzu-Chan Chuang , Liang-yu Chen , Qiong Huang , Krunal Shah , Sai Aditya Chitturu , Sana Mithani

IPC: G06V10/80 , G06V40/16 , G10L15/183 , G10L15/30

Abstract: Implementations relate to generating and using multimodal embeddings. In various implementations, first modality data may be obtained and encoded into first modality embedding(s) using a trained first modality encoder that is stored in memory of edge-based client device(s). Second modality data may be obtained and encoded into second modality embedding(s) using a trained second modality encoder that is also stored in the memory of the edge-based client device(s). The first and second modality embeddings may be processed using an edge-based multimodal LLM that is also stored locally in memory of the edge-based client device(s) to generate a multimodal contextual embedding, which may be provided to a remote server that hosts a central LLM, e.g., in conjunction with a natural language input provided by the user. Information generated using the central LLM, responsive to the natural language input, may be received from the remote server.

Patent Agency Ranking