-
公开(公告)号:US20250045316A1
公开(公告)日:2025-02-06
申请号:US18788178
申请日:2024-07-30
Applicant: Google LLC
Inventor: Jinhyuk Lee , Zhuyun Dai , Xiaoqi Ren , Iftekhar Naim , Yi Luan , Blair Yuxin Chen , Siddhartha Reddy Jonnalagadda , Ming-Wei Chang , Daniel Matthew Cer , Gustavo Adolfo Hernandez Abrego , Jeremy Robert Cole , Colin Hearne Evans , Yuzhe Zhao , Pranay Bhatia , Rajvi Kapadia , Riham Hassan Abdel-Moneim Mansour , Raphael Dominik Hoffman , Simon Kunio Tokumine , Scott Bradley Huffman , Stephen Zachary Karukas , Michael Yiupun Kwong , Shu Zheng , Yan Qiao , Lukas Rutishauser , Anand Rajan Iyer
Abstract: An example method includes providing, to a sequence model (i) a plurality of few-shot prompts, wherein each prompt comprises a demonstration passage, a demonstration task, and a demonstration query, wherein the demonstration task describes a type of retrieval, and wherein the demonstration query is relevant to the demonstration task, and (ii) a plurality of passages sampled from a corpus of passages. The method also includes receiving, from the sequence model and for the plurality of passages and based on the plurality of few-shot prompts, a respective plurality of predicted task-query pairs, the sequence model having been prompted to predict a task based on an input passage, and predict an output query relevant to the predicted task. The method further includes generating a synthetic training dataset comprising the plurality of passages and the respective plurality of predicted task-query pairs. The method also includes providing the synthetic training dataset.
-
公开(公告)号:US10685012B2
公开(公告)日:2020-06-16
申请号:US15424671
申请日:2017-02-03
Applicant: Google LLC
Inventor: Noam M. Shazeer , Colin Hearne Evans , Christopher Robert Waterson , Ryan P. Doherty
Abstract: Methods, and systems, including computer programs encoded on computer storage media for generating compressed representations from a co-occurrence matrix. A method includes obtaining a set of sub matrices of a co-occurrence matrix, where each row of the co-occurrence matrix corresponds to a feature from a first feature vocabulary and each column of the co-occurrence matrix corresponds to a feature from a second feature vocabulary; selecting a sub matrix, wherein the sub matrix is associated with a particular row block and column block of the co-occurrence matrix; assigning respective d-dimensional initial row and column embedding vectors to each row and column from the particular row and column blocks, respectively; and determining a final row embedding vector and a final column embedding vector by iteratively adjusting the initial row embedding vectors and the initial column embedding vectors using the co-occurrence matrix.
-
公开(公告)号:US10460229B1
公开(公告)日:2019-10-29
申请号:US15464053
申请日:2017-03-20
Applicant: Google LLC
Inventor: Dayu Yuan , Ryan P. Doherty , Colin Hearne Evans , Julian David Christian Richardson , Eric E. Altendorf
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for disambiguating word sense. One of the methods includes maintaining a respective word sense numeric representation of each of a plurality of word senses of a particular word; receiving a request to determine the word sense of the particular word when included in a particular text sequence, the particular text sequence comprising one or more context words and the particular word; determining a context numeric representation of the context words in the particular text sequence; and selecting a word sense of the plurality of word senses having a word sense numeric representation that is closest to the context numeric representation as the word sense of the particular word when included in the particular text sequence.
-
-