-
公开(公告)号:US20240169184A1
公开(公告)日:2024-05-23
申请号:US18426212
申请日:2024-01-29
Applicant: Google LLC
Inventor: Tal Schuster , Adam Joshua Fisch , Jai Prakash Gupta , Mostafa Dehghani , Dara Bahri , Vinh Quoc Tran , Yi Tay , Donald Arthur Metzler, JR.
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
-
公开(公告)号:US20240020516A1
公开(公告)日:2024-01-18
申请号:US18222395
申请日:2023-07-14
Applicant: Google LLC
Inventor: Tal Schuster , Adam Joshua Fisch , Jai Prakash Gupta , Mostafa Dehghani , Dara Bahri , Vinh Quoc Tran , Yi Tay , Donald Arthur Metzler, Jr.
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
-
公开(公告)号:US20240289552A1
公开(公告)日:2024-08-29
申请号:US18564859
申请日:2022-05-27
Applicant: Google LLC
Inventor: Yi Tay , Dara Bahri , Donald Arthur Metzler, Jr. , Hyung Won Chung , Jai Prakash Gupta , Sebastian Nikolas Ruder , Simon Baumgartner , Vinh Quoc Tran , Zhen Qin
IPC: G06F40/284
CPC classification number: G06F40/284
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a machine learning task on an input sequence of characters that has a respective character at each of a plurality of character positions to generate a network output. One of the systems includes a neural network configured to perform the machine learning task, the neural network comprising a gradient-based sub-word tokenizer and an output neural network. The gradient-based sub-word tokenizer is configured to apply a learned, i.e., flexible, sub-word tokenization strategy to the input sequence of characters to generate a sequence of latent sub-word representations. The output neural network is configured to process the latent sub-word representation to generate the network output for the task.
-
公开(公告)号:US20220383120A1
公开(公告)日:2022-12-01
申请号:US17827448
申请日:2022-05-27
Applicant: Google LLC
Inventor: Dara Bahri , Donald Arthur Metzler, JR. , Hanxi Heinrich Jiang , Yi Tay
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network having a plurality of network parameters. One of the methods includes obtaining an unlabeled training input from a set of unlabeled training data; processing the unlabeled training input to generate a first embedding; generating a corrupted version of the unlabeled training input, comprising determining a proper subset of the feature dimensions and, for each feature dimension that is in the proper subset of feature dimensions, applying a corruption to the respective feature in the feature dimension using one or more feature values sampled from a marginal distribution of the feature dimension as specified in the set of unlabeled training data; processing the corrupted version of the unlabeled training input to generate a second embedding; and determining an update to the current values of the plurality of network parameters.
-
公开(公告)号:US20250156756A1
公开(公告)日:2025-05-15
申请号:US18835666
申请日:2022-12-30
Applicant: Google LLC
Inventor: Yi Tay , Mostafa Dehghani
IPC: G06N20/00
Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples. The example method includes updating one or more parameters of the machine-learned model based on an evaluation of the plurality of outputs.
-
6.
公开(公告)号:US20230244938A1
公开(公告)日:2023-08-03
申请号:US18160776
申请日:2023-01-27
Applicant: Google LLC
Inventor: Jason Weng Wei , Dengyong Zhou , Xuezhi Wang , Dale Eric Schuurmans , Quoc V. Le , Maarten Paul Bosma , Ed Huai-Hsin Chi , Olivier Jean Andrè Bousquet , Le Hou , Charles Aloysius Sutton , Nathanael Martin Schärli , Nathan Kemp Sekiguchi Scales , Augustus Quadrozzi Odena , Sharan Ajit Narang , Guy Gur-Ari Krakover , Aakanksha Chowdhery , David Martin Dohan , Aitor Lewkowycz , Henryk Michalewski , Jiageng Luan , David J. Bieber , Jacob Austin , Anders Johan Andreassen , Maxwell Isaac Nye , Yi Tay , Mostafa Dehghani
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples. The example method includes updating one or more parameters of the machine-learned model based on an evaluation of the plurality of outputs.
-
公开(公告)号:US20220245428A1
公开(公告)日:2022-08-04
申请号:US17592796
申请日:2022-02-04
Applicant: Google LLC
Inventor: Yi Tay , Da-Cheng Juan , Dara Bahri , Donald Arthur Metzler, JR. , Jai Prakash Gupta , Mostafa Dehghani , Phillip Pham , Vamsi Krishna Aribandi , Zhen Qin
Abstract: Provided are machine-learned attention models that feature omnidirectional processing, example implementations of which can be referred to as Omnidirectional Representations from Transformers (OMNINET). In example models described in the present disclosure, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in some or all of the other tokens across the entire network.
-
公开(公告)号:US20250165469A1
公开(公告)日:2025-05-22
申请号:US18837122
申请日:2023-02-09
Applicant: Google LLC
Inventor: Yi Tay , Vinh Quoc Tran , William Weston Cohen , Donald Arthur Metzler, JR.
IPC: G06F16/2453
Abstract: Provided are systems and methods for training and/or use of a machine learning model that can directly predict one or more resources that are responsive to a query as an output of the model. In particular, the present disclosure demonstrates that information retrieval can be accomplished with a single machine learning model (e.g., that has a neural network architecture such as, for example, a Transformer architecture) in which all information about the corpus is encoded in the parameters of the model. To this end, the present disclosure introduces the Differentiable Search Index (DSI), a new paradigm that learns a query-to-result (e.g., in text-to-text format) model that will map queries (e.g., text strings) directly to relevant resource identifiers (“docids”) (e.g., text and/or number strings that identify relevant resources); in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying retrieval
-
公开(公告)号:US11886976B1
公开(公告)日:2024-01-30
申请号:US18222395
申请日:2023-07-14
Applicant: Google LLC
Inventor: Tal Schuster , Adam Joshua Fisch , Jai Prakash Gupta , Mostafa Dehghani , Dara Bahri , Vinh Quoc Tran , Yi Tay , Donald Arthur Metzler, Jr.
IPC: G06N3/0455
CPC classification number: G06N3/0455
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences using auto-regressive decoder neural networks. In particular, during generation, adaptive early exiting is used to reduce the time required to generate the output sequence.
-
公开(公告)号:US20210248450A1
公开(公告)日:2021-08-12
申请号:US17169718
申请日:2021-02-08
Applicant: Google LLC
Inventor: Yi Tay , Liu Yang , Donald Arthur Metzler, JR. , Dara Bahri , Da-Cheng Juan
Abstract: A system for performing a machine learning task on a network input is described. The system includes one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement (i) multiple sorting networks in which each sorting network is configured to sort vector blocks in a sequence of vector blocks to generate a sorted sequence of vector blocks; and (ii) a sorting attention neural network configured to perform the machine learning task on the input sequence by executing multiple sorting attention mechanisms using the sorting networks.
-
-
-
-
-
-
-
-
-