-
公开(公告)号:US11907674B1
公开(公告)日:2024-02-20
申请号:US18370683
申请日:2023-09-20
Applicant: GOOGLE LLC
Inventor: Oscar Akerlund , Evgeny Sluzhaev , Golnaz Ghiasi , Thang Luong , Yifeng Lu , Igor Petrovski , Ágoston Weisz , Wei Yu , Rakesh Shivanna , Michael Andrew Goodman , Apoorv Kulshreshtha , Yu Du , Amin Ghafouri , Sanil Jain , Dustin Tran , Vikas Peswani , YaGuang Li
CPC classification number: G06F40/40
Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.
-
公开(公告)号:US20250053751A1
公开(公告)日:2025-02-13
申请号:US18413495
申请日:2024-01-16
Applicant: GOOGLE LLC
Inventor: Oscar Akerlund , Evgeny Sluzhaev , Golnaz Ghiasi , Thang Luong , Yifeng Lu , Igor Petrovski , Agoston Weisz , Wei Yu , Rakesh Shivanna , Michael Andrew Goodman , Apoorv Kulshreshtha , Yu Du , Amin Ghafouri , Sanil Jain , Dustin Tran , Vikas Peswani , YaGuang Li
IPC: G06F40/40
Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.
-
3.
公开(公告)号:US11947923B1
公开(公告)日:2024-04-02
申请号:US18520218
申请日:2023-11-27
Applicant: GOOGLE LLC
Inventor: Sanil Jain , Wei Yu , Ágoston Weisz , Michael Andrew Goodman , Diana Avram , Amin Ghafouri , Golnaz Ghiasi , Igor Petrovski , Khyatti Gupta , Oscar Akerlund , Evgeny Sluzhaev , Rakesh Shivanna , Thang Luong , Komal Singh , Yifeng Lu , Vikas Peswani
Abstract: Implementations relate to managing multimedia content that is obtained by large language model(s) (LLM(s)) and/or generated by other generative model(s). Processor(s) of a system can: receive natural language (NL) based input that requests multimedia content, generate a response that is responsive to the NL based input, and cause the response to be rendered. In some implementations, and in generating the response, the processor(s) can process, using a LLM, LLM input to generate LLM output, and determine, based on the LLM output, at least multimedia content to be included in the response. Further, the processor(s) can evaluate the multimedia content to determine whether it should be included in the response. In response to determining that the multimedia content should not be included in the response, the processor(s) can cause the response, including alternative multimedia content or other textual content, to be rendered.
-
-