Patent search ap:("Google LLC") AND inv:"Dengyong Zhou" Page 1

1.

发明授权
Extreme language model compression with optimal sub-words and shared projections 有权

公开(公告)号：US12260340B2

公开(公告)日：2025-03-25

申请号：US18471866

申请日：2023-09-21

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/088 , G06F40/284 , G06N3/045

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

2.

发明申请
Image Analysis by Prompting of Machine-Learned Models Using Chain of Thought 有权

公开(公告)号：US20250094838A1

公开(公告)日：2025-03-20

申请号：US18967327

申请日：2024-12-03

Applicant: Google LLC

Inventor： Jason Weng Wei , Dengyong Zhou , Xuezhi Wang , Dale Eric Schuurmans , Quoc V. Le , Maarten Paul Bosma , Ed Huai-Hsin Chi , Olivier Jean Andrè Bousquet , Le Hou , Charles Aloysius Sutton , Nathanael Martin Schärli , Nathan Kemp Sekiguchi Scales , Augustus Quadrozzi Odena , Sharan Ajit Narang , Guy Gur-Ari Krakover , Aakanksha Chowdhery , David Martin Dohan , Aitor Lewkowycz , Jacob Austin , Henryk Michalewski , David Luan , David J. Bieber , Anders Johan Andreassen , Maxwell Isaac Nye

IPC: G06N5/022

Abstract: An example technique for image analysis is provided. An example image analysis method includes obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response. The example image analysis method includes inputting, to a machine-learned model, the instructive sequence and an operative image processing query that comprises image data, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence. The example method can include generating, using the machine-learned model and responsive to the operative query, an operative image processing response that comprises an analysis of the image data.

3.

发明公开
Instruction Fine-Tuning Machine-Learned Models Using Intermediate Reasoning Steps 审中-公开

公开(公告)号：US20240256965A1

公开(公告)日：2024-08-01

申请号：US18424624

申请日：2024-01-26

Applicant: Google LLC

Inventor： Hyung Won Chung , Barret Zoph , Dengyong Zhou , Liam Fedus , Shayne Longpre , Le Hou , Yi Tay , Jason Weng Wei , Siddhartha Brahma , Quoc V. Le

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: An example method for training a machine-learned sequence processing model includes obtaining a plurality of training examples for training the machine-learned sequence processing model. For each respective training example of the plurality of training examples, the example method includes: obtaining a respective query associated with the respective training example; inputting the respective query to the machine-learned sequence processing model; obtaining, from the machine-learned sequence processing model a response to the respective query and a trace of intermediate states from the respective query to the response; evaluating the response using a ground truth response associated with the respective training example; evaluating the trace using a ground truth trace associated with the respective training example; and updating one or more parameters of the machine-learned sequence processing model based on the evaluation of the response and based on the evaluation of the trace.

4.

发明申请
Extreme Language Model Compression with Optimal Sub-Words and Shared Projections 有权

公开(公告)号：US20210224660A1

公开(公告)日：2021-07-22

申请号：US16749570

申请日：2020-01-22

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/08 , G06N3/04 , G06F40/284

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBAsE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

5.

发明授权
Neural symbolic reader 有权

公开(公告)号：US11954442B2

公开(公告)日：2024-04-09

申请号：US16986534

申请日：2020-08-06

Applicant: Google LLC

Inventor： Chen Liang , Wei Yu , Quoc V. Le , Xinyun Chen , Dengyong Zhou

IPC: G06F40/30 , G06F16/33 , G06F40/20 , G06N3/045 , G06N3/08 , G06N20/00 , G06F40/216 , G06F40/284

CPC classification number: G06F40/30 , G06F16/3347 , G06F40/20 , G06N3/045 , G06N3/08 , G06N20/00 , G06F40/216 , G06F40/284

Abstract: The present disclosure is directed to systems and methods for performing reading comprehension with machine learning. More specifically, the present disclosure is directed to a Neural Symbolic Reader (example implementations of which may be referred to as NeRd), which includes a reader to encode the passage and question, and a programmer to generate a program for multi-step reasoning. By using operators like span selection, the program can be executed over a natural language text passage to generate an answer to a natural language text question. NeRd is domain-agnostic such that the same neural architecture works for different domains. Further, NeRd is compositional such that complex programs can be generated by compositionally applying the symbolic operators.

6.

发明申请
Machine-Learned State Space Model for Joint Forecasting 有权

公开(公告)号：US20210065066A1

公开(公告)日：2021-03-04

申请号：US17008338

申请日：2020-08-31

Applicant: Google LLC

Inventor： Yuan Xue , Dengyong Zhou , Nan Du , Andrew Mingbo Dai , Zhen Xu , Kun Zhang , Yingwei Cui

IPC: G06N20/20 , G06N3/02 , G06F17/18 , G06F17/16

Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.

7.

发明授权
Machine-learned state space model for joint forecasting 有权

公开(公告)号：US12217144B2

公开(公告)日：2025-02-04

申请号：US17008338

申请日：2020-08-31

Applicant: Google LLC

Inventor： Yuan Xue , Dengyong Zhou , Nan Du , Andrew Mingbo Dai , Zhen Xu , Kun Zhang , Yingwei Cui

IPC: G06N20/20 , G06F17/16 , G06F17/18 , G06N3/02

Abstract: A deep state space generative model is augmented with intervention prediction. The state space model provides a principled way to capture the interactions among observations, interventions, critical event occurrences, true states, and associated uncertainty. The state space model can include a discrete-time hazard rate model that provides flexible fitting of general survival time distributions. The state space model can output a joint prediction of event risk, observation and intervention trajectories based on patterns in temporal progressions, and correlations between past measurements and interventions.

8.

发明公开
Extreme Language Model Compression with Optimal Sub-Words and Shared Projections 审中-公开

公开(公告)号：US20240013059A1

公开(公告)日：2024-01-11

申请号：US18471866

申请日：2023-09-21

Applicant: Google LLC

Inventor： Yang Song , Raghav Gupta , Dengyong Zhou , Sanqiang Zhao

IPC: G06N3/0455 , G06F40/40 , G06N3/08

CPC classification number: G06N3/0455 , G06F40/40 , G06N3/08 , G06F40/284

Abstract: Provided is a knowledge distillation technique for training a student language model that, relative to a larger teacher language model, has a significantly smaller vocabulary, lower embedding dimensions, and/or hidden state dimensions. Specifically, aspects of the present disclosure are directed to a dual-training mechanism that trains the teacher and student language models simultaneously to obtain optimal word embeddings for the student vocabulary. In some implementations, this approach can be combined with learning shared projection matrices that transfer layer-wise knowledge from the teacher language model to the student language model. Example experimental results have also demonstrated higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques, including the ability to compress the BERTBASE model by more than 60×, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7 MB.

9.

发明公开
Knowledge Graph Completion and Multi-Hop Reasoning in Knowledge Graphs at Scale 审中-公开

公开(公告)号：US20230289626A1

公开(公告)日：2023-09-14

申请号：US18183410

申请日：2023-03-14

Applicant: Google LLC

Inventor： Hanjun Dai , Dale Eric Schuurmans , Xinyun Chen , Dengyong Zhou , Bo Dai , Hongyu Ren

IPC: G06N5/022 , G06F16/2453

CPC classification number: G06N5/022 , G06F16/2453

Abstract: Provided are computing systems, methods, and platforms for negative sampling in knowledge graphs with improved efficiency. A knowledge graph comprising entities and links between the entities can be obtained. A query computation graph comprising nodes and edges can be generated based on the knowledge graph. The nodes of the query computation graph can include anchor nodes, a root node, and intermediate nodes positioned in paths between the anchor nodes and the root node. A node cut of a query of the query computation graph can be determined and can include at least one node that cuts at least one path between each anchor node and the root node of the query computation graph. Negative samples can be identified by bidirectionally traversing the query computation graph in a first direction from the anchor nodes to the node cut and in a second direction from the root node to the node cut.

10.

发明申请
Systems And Methods For Parameter Sharing To Reduce Computational Costs Of Training Machine-Learned Models 有权

公开(公告)号：US20220108221A1

公开(公告)日：2022-04-07

申请号：US17493442

申请日：2021-10-04

Applicant: Google LLC

Inventor： Dengyong Zhou , Xiaodan Song , Shuo Yang , Qiang Liu , Le Hou

IPC: G06N20/00

Abstract: Systems and methods of the present disclosure are directed to a computer-implemented method. The method can include obtaining a machine-learned model comprising a plurality of model units, wherein each model unit comprises a plurality of parameters that are tied to a shared plurality of parameters. The method can include performing a first plurality of training iterations with the machine-learned model to adjust parameters of the shared plurality of parameters. The method can include detecting, based on the first plurality of training iterations, an occurrence of an untying condition. The method can include untying the parameters of one or more model units from the shared plurality of parameters. The method can include performing a second plurality of training iterations with the machine-learned model to adjust parameters of the one or more model units independent of the shared plurality of parameters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification