-
公开(公告)号:US12260340B2
公开(公告)日:2025-03-25
申请号:US18471866
申请日:2023-09-21
Applicant: Google LLC
Inventor: Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao
IPC: G06N3/088 , G06F40/284 , G06N3/045
Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.
-
公开(公告)号:US20250094838A1
公开(公告)日:2025-03-20
申请号:US18967327
申请日:2024-12-03
Applicant: Google LLC
Inventor: Jason Weng Wei , Dengyong Zhou , Xuezhi Wang , Dale Eric Schuurmans , Quoc V. Le , Maarten Paul Bosma , Ed Huai-Hsin Chi , Olivier Jean Andrè Bousquet , Le Hou , Charles Aloysius Sutton , Nathanael Martin Schärli , Nathan Kemp Sekiguchi Scales , Augustus Quadrozzi Odena , Sharan Ajit Narang , Guy Gur-Ari Krakover , Aakanksha Chowdhery , David Martin Dohan , Aitor Lewkowycz , Jacob Austin , Henryk Michalewski , David Luan , David J. Bieber , Anders Johan Andreassen , Maxwell Isaac Nye
IPC: G06N5/022
Abstract: An example technique for image analysis is provided. An example image analysis method includes obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response. The example image analysis method includes inputting, to a machine-learned model, the instructive sequence and an operative image processing query that comprises image data, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence. The example method can include generating, using the machine-learned model and responsive to the operative query, an operative image processing response that comprises an analysis of the image data.
-
公开(公告)号:US20240256965A1
公开(公告)日:2024-08-01
申请号:US18424624
申请日:2024-01-26
Applicant: Google LLC
Inventor: Hyung Won Chung , Barret Zoph , Dengyong Zhou , Liam Fedus , Shayne Longpre , Le Hou , Yi Tay , Jason Weng Wei , Siddhartha Brahma , Quoc V. Le
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.
-
公开(公告)号:US20210224660A1
公开(公告)日:2021-07-22
申请号:US16749570
申请日:2020-01-22
Applicant: Google LLC
Inventor: Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao
IPC: G06N3/08 , G06N3/04 , G06F40/284
Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBAsE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.
-
公开(公告)号:US11954442B2
公开(公告)日:2024-04-09
申请号:US16986534
申请日:2020-08-06
Applicant: Google LLC
Inventor: Chen Liang , Wei Yu , Quoc V. Le , Xinyun Chen , Dengyong Zhou
IPC: G06F40/30 , G06F16/33 , G06F40/20 , G06N3/045 , G06N3/08 , G06N20/00 , G06F40/216 , G06F40/284
CPC classification number: G06F40/30 , G06F16/3347 , G06F40/20 , G06N3/045 , G06N3/08 , G06N20/00 , G06F40/216 , G06F40/284
Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd is compositional such that complex programs can be generated by compositionally applying the symbolic operators.
-
公开(公告)号:US20210065066A1
公开(公告)日:2021-03-04
申请号:US17008338
申请日:2020-08-31
Applicant: Google LLC
Inventor: Yuan Xue , Dengyong Zhou , Nan Du , Andrew Mingbo Dai , Zhen Xu , Kun Zhang , Yingwei Cui
Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.
-
公开(公告)号:US12217144B2
公开(公告)日:2025-02-04
申请号:US17008338
申请日:2020-08-31
Applicant: Google LLC
Inventor: Yuan Xue , Dengyong Zhou , Nan Du , Andrew Mingbo Dai , Zhen Xu , Kun Zhang , Yingwei Cui
Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.
-
公开(公告)号:US20240013059A1
公开(公告)日:2024-01-11
申请号:US18471866
申请日:2023-09-21
Applicant: Google LLC
Inventor: Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao
IPC: G06N3/0455 , G06F40/40 , G06N3/08
CPC classification number: G06N3/0455 , G06F40/40 , G06N3/08 , G06F40/284
Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.
-
公开(公告)号:US20230289626A1
公开(公告)日:2023-09-14
申请号:US18183410
申请日:2023-03-14
Applicant: Google LLC
Inventor: Hanjun Dai , Dale Eric Schuurmans , Xinyun Chen , Dengyong Zhou , Bo Dai , Hongyu Ren
IPC: G06N5/022 , G06F16/2453
CPC classification number: G06N5/022 , G06F16/2453
Abstract: Provided are computing systems, methods, and platforms for negative sampling in knowledge graphs with improved efficiency. A knowledge graph comprising entities and links between the entities can be obtained. A query computation graph comprising nodes and edges can be generated based on the knowledge graph. The nodes of the query computation graph can include anchor nodes, a root node, and intermediate nodes positioned in paths between the anchor nodes and the root node. A node cut of a query of the query computation graph can be determined and can include at least one node that cuts at least one path between each anchor node and the root node of the query computation graph. Negative samples can be identified by bidirectionally traversing the query computation graph in a first direction from the anchor nodes to the node cut and in a second direction from the root node to the node cut.
-
10.
公开(公告)号:US20220108221A1
公开(公告)日:2022-04-07
申请号:US17493442
申请日:2021-10-04
Applicant: Google LLC
Inventor: Dengyong Zhou , Xiaodan Song , Shuo Yang , Qiang Liu , Le Hou
IPC: G06N20/00
Abstract: Systems and methods of the present disclosure are directed to a computer-implemented method. The method can include obtaining a machine-learned model comprising a plurality of model units, wherein each model unit comprises a plurality of parameters that are tied to a shared plurality of parameters. The method can include performing a first plurality of training iterations with the machine-learned model to adjust parameters of the shared plurality of parameters. The method can include detecting, based on the first plurality of training iterations, an occurrence of an untying condition. The method can include untying the parameters of one or more model units from the shared plurality of parameters. The method can include performing a second plurality of training iterations with the machine-learned model to adjust parameters of the one or more model units independent of the shared plurality of parameters.
-
-
-
-
-
-
-
-
-