Wide and deep machine learning models

    公开(公告)号:US10762422B2

    公开(公告)日:2020-09-01

    申请号:US15394668

    申请日:2016-12-29

    Applicant: Google LLC

    Abstract: A system includes one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the computers to implement a combined machine learning model for processing an input including multiple features to generate a predicted output for the machine learning input. The combined model includes: a deep machine learning model configured to process the features to generate a deep model output; a wide machine learning model configured to process the features to generate a wide model output; and a combining layer configured to process the deep model output generated by the deep machine learning model and the wide model output generated by the wide machine learning model to generate the predicted output, in which the deep model and the wide model have been trained jointly on training data to generate the deep model output and the wide model output.

    Knowledge Distillation Via Learning to Predict Principal Components Coefficients

    公开(公告)号:US20250005453A1

    公开(公告)日:2025-01-02

    申请号:US18710814

    申请日:2022-12-12

    Applicant: Google LLC

    Abstract: Provided is an approach for knowledge distillation based on exporting Principal Components approximations (e.g., Bregman representations) of one or more layer-wise representations of the teacher model. In particular, the present disclosure provides an extension to the original Bregman PCA formulation by incorporating a mean vector and orthonormalizing the principal directions with respect to the geometry of the local convex function around the mean. This extended formulation allows viewing the learned representation as a dense layer, thus casting the problem as learning the linear coefficients of the compressed examples, as the input to this layer, by the student network. Example empirical data indicates that example implementations of the approach improve performance when compared to typical teacher-student training using soft labels.

    Heterogeneous Federated Learning Via Multi-Directional Knowledge Distillation

    公开(公告)号:US20240249193A1

    公开(公告)日:2024-07-25

    申请号:US18417947

    申请日:2024-01-19

    Applicant: Google LLC

    CPC classification number: G06N20/00

    Abstract: Generally, the present disclosure is directed to enhanced federated learning (FL) that employs a set of clients with varying amounts of computational resources (e.g., system memory, storage, and processing bandwidth). To overcome limitations of conventional FL methods that employ a set of clients with varying amounts of computational resources, the embodiments run multi-directional knowledge distillation between the server models produced by each federated averaging (FedAvg) pool, using unlabeled server data as the distillation dataset. By co-distilling the two (or more) models frequently over the course of FedAvg rounds, information is shared between the pools without sharing model parameters. This leads to increased performance and faster convergence (in fewer federated rounds).

    DECENTRALIZED LEARNING OF MACHINE LEARNING MODEL(S) THROUGH UTILIZATION OF STALE UPDATES(S) RECEIVED FROM STRAGGLER COMPUTING DEVICE(S)

    公开(公告)号:US20240095582A1

    公开(公告)日:2024-03-21

    申请号:US18075757

    申请日:2022-12-06

    Applicant: GOOGLE LLC

    CPC classification number: G06N20/00

    Abstract: During a round of decentralized learning for updating of a global machine learning (ML) model, remote processor(s) of a remote system may transmit, to a population of computing devices, primary weights for a primary version of the global ML model, and cause each of the computing devices to generate a corresponding update for the primary version of the global ML model. Further, the remote processor(s) may cause the primary version of the global ML model to be updated based on the corresponding updates that are received during the round of decentralized learning. However, the remote processor(s) may receive other corresponding updates subsequent to the round of decentralized learning. Accordingly, various techniques described herein (e.g., FARe-DUST, FeAST on MSG, and/or other techniques) enable the other corresponding updates to be utilized in achieving a final version of the global ML model.

    ATTENTION NEURAL NETWORKS WITH N-GRAMMER LAYERS

    公开(公告)号:US20240078379A1

    公开(公告)日:2024-03-07

    申请号:US17903805

    申请日:2022-09-06

    Applicant: Google LLC

    CPC classification number: G06F40/20 G06N3/04

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing a machine learning task on a network input to generate a network output. In one aspect, one of the systems includes a neural network configured to perform the machine learning task, the neural network comprising an N-grammer layer and an output neural network, the N-grammer layer configured to: at each of one or more heads: receive a sequence of input embeddings; generate a discrete latent representation of the sequence of input embeddings by using a learned product quantization codebook; generate a plurality of n-gram indices from the discrete latent representation; and generate a latent n-gram representation of the sequence of input embeddings; and generate a sequence of output embeddings, and the output neural network configured to: receive the sequence of output embeddings; and process the sequence of output embeddings to generate the network output.

    HYBRID FEDERATED LEARNING OF MACHINE LEARNING MODEL(S)

    公开(公告)号:US20240070530A1

    公开(公告)日:2024-02-29

    申请号:US18074729

    申请日:2022-12-05

    Applicant: GOOGLE LLC

    CPC classification number: G06N20/00

    Abstract: Implementations disclosed herein are directed to a hybrid federated learning (FL) technique that utilizes both federated averaging (FA) and federated distillation (FD) during a given round of FL of a given global machine learning (ML) model. Implementations may identify a population of client devices to participate in the given round of FL, determine a corresponding quantity of instances of client data available at each of the client devices that may be utilized during the given round of FL, and select different subsets of the client devices based on the corresponding quantity of instances of client data. Further, implementations may cause a first subset of the client devices to generate a corresponding FA update and a second subset of client devices to generate a corresponding FD update. Moreover, implementations may subsequently update the given global ML model based on the corresponding FA updates and the corresponding FD updates.

Patent Agency Ranking