-
公开(公告)号:US12001509B2
公开(公告)日:2024-06-04
申请号:US16821509
申请日:2020-03-17
Applicant: Google LLC
Inventor: Seungyeon Kim , Jingzhao Zhang , Andreas Veit , Sanjiv Kumar , Sashank Reddi , Praneeth Karimireddy
CPC classification number: G06F17/18 , G06F18/217 , G06N20/00 , G06N3/084
Abstract: Generally, the present disclosure is directed to systems and methods that perform adaptive optimization with improved convergence properties. The adaptive optimization techniques described herein are useful in various optimization scenarios, including, for example, training a machine-learned model such as, for example, a neural network. In particular, according to one aspect of the present disclosure, a system implementing the adaptive optimization technique can, over a plurality of iterations, employ an adaptive per coordinate clipping threshold to clip a current first moment of the coordinate to obtain a current update value that enables faster convergence for the machine-learned model when the noise in the stochastic gradients is heavy tailed.
-
公开(公告)号:US20230112862A1
公开(公告)日:2023-04-13
申请号:US17960380
申请日:2022-10-05
Applicant: Google LLC
Inventor: Venkata S. Bhojanapalli , Andreas Veit , Ayan Chakrabarti , Frederick Liu , Himanshu Jain , Michal Lukasik , Sanjiv Kumar , Yin-Wen Chang
IPC: G06N3/04
Abstract: Provided are systems and methods that improve the computational efficiency of Transformers or other attention-based neural networks or machine learning models by re-using a number of attention scores between layers and/or heads of the model. To reduce the computational cost of self-attention-based models while achieving comparable or even superior results, example aspects of the present disclosure propose a novel architecture that reuses attention scores computed in one layer in one or multiple subsequent layers.
-
公开(公告)号:US20230111978A1
公开(公告)日:2023-04-13
申请号:US17910756
申请日:2020-03-18
Applicant: GOOGLE LLC
Inventor: Andreas Veit , Kimberly Wilber
IPC: G06N3/08 , G06F16/903
Abstract: Techniques are disclosed that enable learning an embedding space using cross-examples, where a distance between a query and an electronic resource in the embedding space provides an indication of the relevance of the electronic resource to the query. Various implementations include learning the embedding space using cross-example Softmax techniques. Various implementations include leaning the embedding space using cross-example negative mining. Additional or alternative techniques are disclosed that enable determining an electronic resource for a query based on comparing a query vector (e.g., a embedding space representation of the query) with a set of pre-stored candidate electronic resource vectors (e.g., an embedding space representation of a set of candidate electronic resources).
-
公开(公告)号:US20210295201A1
公开(公告)日:2021-09-23
申请号:US16821509
申请日:2020-03-17
Applicant: Google LLC
Inventor: Seungyeon Kim , Jingzhao Zhang , Andreas Veit , Sanjiv Kumar , Sashank Reddi , Praneeth Karimireddy
Abstract: Generally, the present disclosure is directed to systems and methods that perform adaptive optimization with improved convergence properties. The adaptive optimization techniques described herein are useful in various optimization scenarios, including, for example, training a machine-learned model such as, for example, a neural network. In particular, according to one aspect of the present disclosure, a system implementing the adaptive optimization technique can, over a plurality of iterations, employ an adaptive per coordinate clipping threshold to clip a current first moment of the coordinate to obtain a current update value that enables faster convergence for the machine-learned model when the noise in the stochastic gradients is heavy tailed.
-
公开(公告)号:US20230017505A1
公开(公告)日:2023-01-19
申请号:US17375960
申请日:2021-07-14
Applicant: Google LLC
Inventor: Aditya Krishna Menon , Sanjiv Kumar , Himanshu Jain , Andreas Veit , Ankit Singh Rawat , Gayan Sadeep Jayasumana Hirimbura Matara Kankanamge
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for accounting for long-tail training data.
-
-
-
-