-
公开(公告)号:US20250045316A1
公开(公告)日:2025-02-06
申请号:US18788178
申请日:2024-07-30
Applicant: Google LLC
Inventor: Jinhyuk Lee , Zhuyun Dai , Xiaoqi Ren , Iftekhar Naim , Yi Luan , Blair Yuxin Chen , Siddhartha Reddy Jonnalagadda , Ming-Wei Chang , Daniel Matthew Cer , Gustavo Adolfo Hernandez Abrego , Jeremy Robert Cole , Colin Hearne Evans , Yuzhe Zhao , Pranay Bhatia , Rajvi Kapadia , Riham Hassan Abdel-Moneim Mansour , Raphael Dominik Hoffman , Simon Kunio Tokumine , Scott Bradley Huffman , Stephen Zachary Karukas , Michael Yiupun Kwong , Shu Zheng , Yan Qiao , Lukas Rutishauser , Anand Rajan Iyer
Abstract: An example method includes providing, to a sequence model (i) a plurality of few-shot prompts, wherein each prompt comprises a demonstration passage, a demonstration task, and a demonstration query, wherein the demonstration task describes a type of retrieval, and wherein the demonstration query is relevant to the demonstration task, and (ii) a plurality of passages sampled from a corpus of passages. The method also includes receiving, from the sequence model and for the plurality of passages and based on the plurality of few-shot prompts, a respective plurality of predicted task-query pairs, the sequence model having been prompted to predict a task based on an input passage, and predict an output query relevant to the predicted task. The method further includes generating a synthetic training dataset comprising the plurality of passages and the respective plurality of predicted task-query pairs. The method also includes providing the synthetic training dataset.