-
公开(公告)号:US20250139379A1
公开(公告)日:2025-05-01
申请号:US18385270
申请日:2023-10-30
Applicant: GOOGLE LLC
Inventor: Sanil Jain , Wei Yu , Alessandro Agostini , Agoston Weisz , Michael Andrew Goodman , Attila Dankovics , Elle Chae , Evgeny Sluzhaev , Amin Ghafouri , Golnaz Ghiasi , Igor Petrovski , Konstantin Shagin , Marcelo Menegali , Oscar Akerlund , Rakesh Shivanna , Thang Luong , Tiffany Chen , Vikas Peswani , Yifeng Lu
IPC: G06F40/40 , G06F16/483
Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)) and other generative model(s). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input to generate LLM output, and determine, based on the LLM output, textual content and generative multimedia content for inclusion in the multi-modal response. In some implementations, the generative multimedia content can be generated by another generative model (e.g., an image generator, a video generator, an audio generator, etc.) based on generative multimedia content prompt(s) included in the LLM output and that is indicative of the generative multimedia content. In various implementations, the generative multimedia content can be interleaved between segments of the textual content.